Unbiased high-throughput massively parallel sequencing methods have transformed the process of discovery of novel putative driver gene mutations in cancer. In chronic lymphocytic leukemia (CLL), these methods have yielded several unexpected findings, including the driver genes SF3B1, NOTCH1 and POT1. Recent analysis, utilizing down-sampling of existing datasets, has shown that the discovery process of putative drivers is far from complete across cancer. In CLL, while driver gene mutations affecting >10% of patients were efficiently discovered with previously published CLL cohorts of up to 160 samples subjected to whole exome sequencing (WES), this sample size has only 0.78 power to detect drivers affecting 5% of patients, and only 0.12 power for drivers affecting 2% of patients. These calculations emphasize the need to apply unbiased WES to larger patient cohorts.

To this end, we performed a combined analysis of CLL WES data joining together our previously published cohort of 159 CLLs with data from 103 CLLs collected by the International Cancer Genome Consortium (ICGC). The raw sequencing reads from these 262 primary tumor samples (102 CLL with unmutated IGHV, 147 with mutated IGHV, 13 with unknown IGHV status) were processed together and aligned to the hg19 reference genome. Somatic single nucleotide variations (sSNVs) and indels were detected using MuTect. Subsequently, inference of recurrently mutated genes was performed using the MutSig algorithm. This method combined several characteristics such as the overall mutation rate per sample, the gene specific background mutation rate, non-synonymous/synonymous ratio and mutation clustering to detect genes that are affected by mutations more than expected by chance.

This analysis identified 40 recurrently mutated genes in this cohort. This included 22 of 25 previously identified recurrently mutated genes in CLL. In addition, 18 novel candidate CLL drivers were identified, mostly affecting 1-2% of patients. The novel candidates included two histone proteins HIST1H1D and HIST1H1C, in addition to the previously identified HIST1H1E. Another was IKZF3, affected by a recurrent sSNV resulting in a p.L162R change in its DNA binding domain, in close proximity to a region recently identified as critical for lenalidomide resistance in multiple myeloma (MM). An additional recurrently mutated gene was nuclear RNA export factor 1 (NXF1), which along with previously known recurrently mutated genes (SF3B1, XPO1, DDX3X), highlights the importance of RNA processing to CLL biology. Finally, this search for putative CLL driver genes also identified ASXL1 and TRAF3, already characterized as drivers in acute myeloid leukemia and MM, respectively. Of the 59 of 262 samples for which RNA-seq data were available, 76% of the identified driver mutations were detected and thereby validated. Validation using RNAseq detection of driver mutations and targeted sequencing within the entire cohort are ongoing.

The larger size of our cohort enabled the separate application of the somatic mutation discovery process to samples with mutated or unmutated IGHV. Among the 147 samples with mutated IGHV, only 5 driver genes (TP53, SF3B1, MYD88, CHD2, RANBP2) retained significance. In contrast, analysis of the 102 IGHV unmutated samples revealed a distinct and more diverse pattern of recurrently mutated genes (lacking MYD88 and CHD2, and including NOTCH1, RPS15, POT1, NRAS, EGR2, BRAF, MED12, XPO1, BCOR, IKZF3, MAP2K1, FBXW7 and KRAS). This extended cohort also allowed for better resolution of the clinical impact of those genetic variants with greater than 4% prevalence in the cohort. For example, samples with POT1 mutations were found to be associated with shorter time from sample to therapy compared with those with wild-type POT1 (P= 0.02).

Our study demonstrates that with larger cohort size, we can effectively detect putative driver genes with lower prevalence, but which may nonetheless have important biological and clinical impact. Moreover, our interrogation shows that subset analysis can reveal distinct driver patterns in different disease subsets. In particular, the marked clinical difference between CLLs with mutated and unmutated IGHV may reflect the higher likelihood of the latter group to harbor a broader spectrum of driver mutations with a more complex pattern of co-occurrence.


Brown:Sanofi, Onyx, Vertex, Novartis, Boehringer, GSK, Roche/Genentech, Emergent, Morphosys, Celgene, Janssen, Pharmacyclics, Gilead: Consultancy.

Author notes


Asterisk with author names denotes non-ASH members.

Sign in via your Institution