Introduction: Despite marked improvements in our understanding of the biology and genetic basis of acute myeloid leukemia (AML) gained in the past 4 decades, the overall survival of AML has only improved slightly. The strategy for targeting AML once the disease is diagnosed remains grim due to subclonal genetic diversity and the stemness properties of relapse originating cell types. As studies in solid tumors have shown, strategies focused on targeting the cancer at earlier stages hold considerable promise, however no predictive tests are available for AML. The existence of clonally expanded pre-leukemic hematopoietic stem and progenitor cells (preL-HSPC) within the diagnostic blood sample of AML patients provides strong evidence that the disease must evolve over long time periods prior to diagnosis. However, with the discovery that age related clonal hematopoiesis (ARCH) is exceedingly common during normal aging; a refined search for the factors that distinguish ARCH from those rare individuals who will actually develop AML had to be taken.

Methods: We hypothesized that the basis for clonal expansion in ARCH was different than in the clonal expansion leading to AML. To investigate genetic differences, we developed sensitive and accurate error correction sequencing method to study blood samples of healthy individuals from the EPIC cohort before AML diagnosis. These 96 pre-AML cases donated their blood on average 6 years before eventually being diagnosed with AML; a set of 420 matched controls were sequenced in the same way. Furthermore, 3.2 million electronic health records (EHR) from the Clalit database were inspected to identify non-genetic AML-risk prediction parameters.

Results: The overall variant allele frequency (VAF) distribution of the identified mutations was significantly different between cases and controls. Specifically, a greater proportion of cases carried somatic mutations with intermediate-high VAF (70.8% versus 36.6% in the controls). The median number of all somatic mutations was significantly higher in the cases as compared with the controls (7 and 5 respectively) and was correlated with aging in both groups. Interestingly, the pre-AML group had a higher rate of mutation accumulation compared to controls, resulting in significantly greater numbers of mutations occurring after the age of 55. Recurrent AML mutations were more prevalent in cases compared to controls (Age>=55: OR, 7.571; 95% CI, 4.055-14.387). Specifically, highly recurrent splicing factors mutations (e.g. SRSF2 P95H, U2AF1 Q84P, and SF3B1 K700E) were found solely in the pre-AML group suggesting that specific mutations that rewire the splicing machinery inevitably leads to AML when acquired in HSPC. The VAF of recurrent AML mutations was significantly higher in the cases compared to controls. Furthermore we observed a higher proportion of individuals with more than one recurrent AML mutation in the pre-AML group compared to controls (OR, 14.316; 95% CI, 6.318-34.933). Incorporating these factors, we construct a computational model for predicting progression toward AML. The AML risk prediction model predicted progression on average 7 years before actual diagnosis (HR, 20.1, 95% CI 8.59-46.8). EHR data was found to be highly valuable for the identification of high-risk population. Compared to controls, AML patients (N=982) showed aberrant blood counts up to 6 months prior diagnosis (relative risk, 22.2).

Conclusions: These results reveal that the clonal expansion occurring during normal aging is highly distinct from pre-AML clonal expansion, which has a highly predictable evolutionary trajectory. Our study provides clear proof of concept for the feasibility of early AML prediction based on somatic mutation patterns, yet, since ARCH is common, the real-world feasibility of applying our AML risk prediction model will require identifying higher-risk individuals through routine clinical testing. We have shown that information obtained from EHR is useful for that purpose. Future studies of independent population cohorts with access to serial viable blood and leukemia samples will allow for incorporation of information such as the specific identity of the mutated cells and clone expansion kinetics. Such studies will aid in developing better understanding of which individuals should be selected for future clinical trials to study the potential benefits of early interventions in this deadly disease.


No relevant conflicts of interest to declare.

Author notes


Asterisk with author names denotes non-ASH members.

Sign in via your Institution