The Rare Diseases Pilot study of the 100,000 Genomes Project had two objectives. Firstly, to identify the DNA variants underlying unresolved Mendelian disorders. Secondly, to develop an accredited framework for delivering whole genome sequencing (WGS) results across a national healthcare system. From February 2014 to June 2017, 13,037 individuals with a rare disease and their relatives were recruited at 57 National Health Service (NHS) hospitals in the UK and 26 non-UK hospitals using standardized eligibility criteria for 12 rare disease domains. This cohort includes cases with haematology (n=1021), immunology (n=1359) and haemostasis disorders (n=1169).

With informed consent, clinical and laboratory data were collected and coded into a single research database using Human Phenotype Ontology (HPO) terms and 13,037 samples of DNA were Illumina WGS analysed to clinical standard at a mean depths >30X in all samples and 90% of the reference genome was covered at 19X minimum in all samples. The pilot resource contains over 165 million unique variants with 91.5%, 8,5% and 5.6% single nucleotide variants (SNVs), short insertions / deletions and large deletions of the 10,258 genetically independent samples with 47% of variants previously unobserved in large scale genome datasets (e.g. TopMED, gnomAD, UK10K). Across all domains 2,067 unique diagnostic-grade genes (DGGs) were curated to clinical standards to support pertinent finding reporting by 12 multi-disciplinary teams (MDTs) with domain-relevant clinical and genetic expertise. Over 1,300 MDT reports assigning pathogenic or likely pathogenic causal variants have been returned to referring clinicians, with the diagnostic yield ranging from 1.6 to 53.8%, depending on the extent of genetic screening pre-enrolment and the importance of the non-genetic component of the disorder (e.g. in immune disorders). About 30% of the causal variants identified have never been reported (absent from the Human Gene Mutation Database v.2018.1); interestingly, 51 variants have been reported in 11 DGGs linked to phenotypes belonging to different domains. A comparison with standard whole exome sequencing results revealed WGS to have at least 12.5% superiority in sensitivity for detecting known pathogenic variants.

For the haematology, immunology and haemostasis domains 330 causal variants were reported in 83 DGGs, revealing novel modes of inheritance (Sivapalaratnam et al Blood 2016), and entire new clinical phenotypes linked to mutations in ABCC4, GNE, KDSR and STIM1 amongst other DGGs. The genotype and HPO-coded phenotypes of all pilot cases were analysed with BeviMed, a rapid and scalable Bayesian association test (Greene et al AJHG 2017) to identify causal variants in hitherto unknown genes. This identified more than 30 genes with posterior probabilities indicating a high likelihood of being implicated in underlying as yet unresolved Mendelian disorders. Results from co-segregation and cell biology studies have already corroborated this statistical inference and 15 novel genes acquired DGG status. Including a new method for the analysis of the 'gene-regulatory' elements we also identified an example of a causal variant in such an element controlling the function of both GATA1 and HDAC6 resulting in a severe syndromic pathology characterized by abnormal erythropoiesis and megakaryopoiesis.

In conclusion, the pilot of the 100,000 Genomes Project has shown the feasibility of using WGS across a national health system, such as the NHS, to deliver a molecular diagnosis for patients with rare inherited diseases and how a national genotype/HPO-coded phenotype resource provides a powerful platform for the identification of novel diagnostic-grade genes.


No relevant conflicts of interest to declare.

Author notes


Asterisk with author names denotes non-ASH members.