Key Points
The weighted expressions of 7 coding and 3 noncoding genes is strongly associated with relapse in CN-AML patients.
The 10-gene signature is independent from mutations known to associate with outcome in AML patients.
Abstract
Although ∼80% of adult patients with cytogenetically normal acute myeloid leukemia (CN-AML) achieve a complete remission (CR), more than half of them relapse. Better identification of patients who are likely to relapse can help to inform clinical decisions. We performed RNA sequencing on pretreatment samples from 268 adults with de novo CN-AML who were younger than 60 years of age and achieved a CR after induction treatment with standard “7+3” chemotherapy. After filtering for genes whose expressions were associated with gene mutations known to impact outcome (ie, CEBPA, NPM1, and FLT3-internal tandem duplication [FLT3-ITD]), we identified a 10-gene signature that was strongly predictive of patient relapse (area under the receiver operating characteristics curve [AUC], 0.81). The signature consisted of 7 coding genes (GAS6, PSD3, PLCB4, DEXI, JMY, NRP1, C10orf55) and 3 long noncoding RNAs. In multivariable analysis, the 10-gene signature was strongly associated with relapse (P < .001), after adjustment for the FLT3-ITD, CEBPA, and NPM1 mutational status. Validation of the expression signature in an independent patient set from The Cancer Genome Atlas showed the signature’s strong predictive value, with AUC = 0.78. Implementation of the 10-gene signature into clinical prognostic stratification could be useful for identifying patients who are likely to relapse.
Introduction
A major obstacle to improved survival of patients with acute myeloid leukemia (AML) is disease relapse after achievement of complete remission (CR). Prognostic stratification using molecular and cytogenetic markers is useful for the early identification of patients who are likely to be refractory to standard induction chemotherapy regimens and/or have a higher risk for relapse; thus, it is being used for making informed clinical decisions. The 2017 European LeukemiaNet (ELN) genetic risk classification is widely accepted as the standard method for prognostic stratification of AML patients.1 However, the 2017 ELN classification includes only selected gene mutations and cytogenetic abnormalities and does not take into account gene expression data.1
Genomic alterations underlying disease in AML patients are heterogeneous, including diverse transcriptional profiles.2,3 Previous studies have demonstrated that the use of differential expression of single genes and, more recently, gene expression signatures, are effective tools for risk stratification of AML patients.2-10 Herein, we sought to explore the association between gene expression and disease relapse in first CR in adult patients younger than 60 years of age who were diagnosed with cytogenetically normal acute myeloid leukemia (CN-AML).
Methods
Total transcriptome RNA sequencing (RNAseq) was performed using pretreatment blood or bone marrow samples from 268 adult CN-AML patients younger than 60 years who were similarly treated with intensive chemotherapy on Cancer and Leukemia Group B (CALGB) (now part of Alliance for Clinical Trials in Oncology [Alliance]) therapeutic trials, including CALGB 10503 (ClinicalTrials.gov Identifier: NCT00416598), CALGB 10603 (NCT00651261), and CALGB 19808 (NCT00006363) (see supplemental Data) and achieved a CR. The patient cohort did not include patients with AML secondary to antecedent hematologic disorder or patients with therapy-related AML. Targeted sequencing of 80 cancer- and leukemia-associated genes, as well as detection of FLT3-internal tandem duplication (FLT3-ITD) and CEBPA mutations, were performed previously on all patients.11-13 Pretreatment cytogenetic analyses were performed in the CALGB/Alliance-approved institutional laboratories. The presence of a normal karyotype was determined by examination of ≥20 metaphase cells obtained from short-term (24- and/or 48-hour) unstimulated cultures of bone marrow samples and confirmed by central karyotype review in each case.14
RNAseq reads were aligned to hg38 using HISAT2,15 and gene counts were obtained using featureCounts.16 Normalization was performed with DeSeq2,17 which divides counts by sample-specific size factors determined by the median ratio of gene counts relative to geometric mean per gene. Hierarchical clustering was performed using the hclust function in the R (v4.0.1) stats package with Ward’s method, performed on a distance matrix computed using the ClassDiscovery R package with the absolute Pearson metric.18 Random forest models were generated with the randomForest R package, performing 100 iterations with n = 501 and default mtry.19 Expression between groups was assessed using a negative binomial model with DeSeq2 or random forests, as indicated after removing genes with low expression (normalized counts < 10) and low variability (standard deviation < 10). Predictive ability of the random forest model was optimized by first determining the importance of all 539 genes and then iterating through different numbers of genes (n = 2-20, 25, 30, 35, 40, 45, 50, 75, 100, 200, 500), starting with the most important, to determine the number that produces the highest area under the receiver operating characteristics curve (AUC). Multivariable logistic and proportional hazards regression models used a backward selection technique to build the final models for relapse and disease-free survival (DFS) that included relapse prediction score, clinical variables, mutation status, and indicated gene expressions associated with relapse at a level of P < .2 from univariable analyses.
Genotyping of germline polymorphisms was performed previously on all patients, as described, using Infinium HumanOmni1-Quad BeadChip arrays (Illumina, San Diego, CA).20 Imputation was performed using the haplotype reference consortium,21 and testing for associations between germline polymorphisms and genes expressions was done with Matrix eQTL.22
Results
We performed RNAseq on 268 adult CN-AML patients younger than 60 years of age and then compared gene expressions between patients who relapsed (n = 164) and patients who remained in CR for ≥3 years (n = 104). The mutation status of 18 genes that were found to be mutated in ≥3% of patients and the patients’ pretreatment characteristics, including assignment to genetic-risk groups according to the 2017 ELN classification, are presented in supplemental Table 1. Differential expression analysis using a negative binomial model identified 255 genes that were significantly differentially expressed (adjusted P value < .001 and absolute fold change > 0.667; supplemental Table 2). Hierarchical clustering was performed using these genes, which separated patients into distinct groups (Figure 1). Although these patient clusters had different rates of relapse, they were strongly associated with mutations known to be associated with AML prognosis, specifically mutations in NPM1, biallelic CEBPA mutations, and FLT3-ITD (Figure 1).
Clustering of patients with CN-AML based on expression of 255 genes associated with relapse. Heatmap shows expressions of genes differentially expressed between patients who relapsed and those who did not relapse for ≥3 years after achieving a CR. Each row of the heatmap represents expression of a gene, and each column represents a patient. Differential expression analysis to determine the 255 genes included was performed using a negative binomial model with the DeSeq2 R package. Shown above the heatmap is the relapse status for each patient, and the mutation statuses of genes mutated in ≥9 patients, as assessed by sequencing 81 genes. Six genes included in the 10-gene relapse signature that we derived in this study are indicated with arrows on the right side of the heatmap.
Clustering of patients with CN-AML based on expression of 255 genes associated with relapse. Heatmap shows expressions of genes differentially expressed between patients who relapsed and those who did not relapse for ≥3 years after achieving a CR. Each row of the heatmap represents expression of a gene, and each column represents a patient. Differential expression analysis to determine the 255 genes included was performed using a negative binomial model with the DeSeq2 R package. Shown above the heatmap is the relapse status for each patient, and the mutation statuses of genes mutated in ≥9 patients, as assessed by sequencing 81 genes. Six genes included in the 10-gene relapse signature that we derived in this study are indicated with arrows on the right side of the heatmap.
To find gene expressions associated with relapse that are independent from the aforementioned mutations, we filtered out genes that were significantly differentially expressed between patients with and without NPM1 mutations (2064 genes), biallelic CEBPA mutations (3923 genes), and FLT3-ITD (675 genes; adjusted P value < .01 and absolute fold change > 0.667; supplemental Tables 3-5). From the remaining 14 741 genes, we used a cutoff of an absolute fold-change difference > 0.3 and a P value < .1 to select 539 genes that were input into a random forest model to predict CR (supplemental Table 6). Optimization iterations determined that the maximum predictive power was achieved using a model fit on the expression of the following 10 genes: NRP1, PLCB4, JMY, PSD3, DEXI, GAS6, C10orf55, AC139769.2, AC015712.2, and AL096865.1; these genes were assigned importances from the model based on their ability to predict relapse (Table 1). The AUC of this model was 0.81 (Figure 2A), and the 10-gene signature correctly classified 141 of 165 patients who relapsed and 65 of 104 patients who maintained a CR (Figure 2B). Classifying patients into genetic-risk groups according to the 2017 ELN criteria revealed that the 10-gene signature correctly predicted relapse in 94% of patients in the adverse-risk group, 86% of patients in the intermediate-risk group, and 71% of patients in the favorable-risk group (supplemental Table 7).
Gene expression signature is predictive of relapse in patients with CN-AML. (A) Receiver operating characteristic (ROC) curve shows the sensitivity and specificity of 10-gene expression signature for predicting relapse in 268 adult CN-AML patients younger than 60 years. (B) Predicted relapse probability for the 268 patients determined using the 10-gene signature. Each bar represents a patient, colored according to actual relapse status. (C) Predicted relapse probability for a validation set of 32 adult patients with CN-AML younger than 60 years included in the TCGA database,31 determined using the 10-gene signature. (D) ROC curve showing the sensitivity and specificity of the 10-gene expression signature for predicting relapse in the 32 TCGA patients with CN-AML. Maintain CR denotes CR maintained for ≥3 years.
Gene expression signature is predictive of relapse in patients with CN-AML. (A) Receiver operating characteristic (ROC) curve shows the sensitivity and specificity of 10-gene expression signature for predicting relapse in 268 adult CN-AML patients younger than 60 years. (B) Predicted relapse probability for the 268 patients determined using the 10-gene signature. Each bar represents a patient, colored according to actual relapse status. (C) Predicted relapse probability for a validation set of 32 adult patients with CN-AML younger than 60 years included in the TCGA database,31 determined using the 10-gene signature. (D) ROC curve showing the sensitivity and specificity of the 10-gene expression signature for predicting relapse in the 32 TCGA patients with CN-AML. Maintain CR denotes CR maintained for ≥3 years.
The predictive relapse score for each patient generated by the 10-gene signature was input into a multivariable logistic regression model for relapse and a Cox proportional multiple regression model for DFS, which contained all available clinical and demographic variables, gene mutations present in ≥8 patients, and expression of ERG, BAALC, MN1, miR-155, and miR-3151, which were previously shown to be associated with outcome in adults with CN-AML (supplemental Data).23-30 The logistic multivariable regression model showed that the 10-gene predictive score was significantly associated with the risk of patient relapse (P < .001; odds ratio, 1.79; 95% confidence interval [CI], 1.52-2.13). Biallelic CEBPA mutations, mutation of NPM1, and FLT3-ITD also remained significant in the same model (Table 2). In the DFS Cox proportional multiple regression model, the 10-gene predictive score was associated with DFS (P < .001; hazard ratio, 1.32; 95% CI, 1.22-1.43) after adjusting for biallelic CEBPA mutations, FLT3-ITD, and MN1 expression (Table 2). Together, these data indicate that the 10-gene signature is a strong predictor of relapse in younger CN-AML patients treated with intensive induction chemotherapy, and it adds predictive value to mutations that are already known to predict relapse.
To independently validate the 10-gene signature in another patient set, we used expression data from The Cancer Genome Atlas (TCGA) for AML.31 TCGA data contained 32 CN-AML patients younger than 60 years of age who achieved a CR, 22 of whom relapsed in first CR.31 We calculated the 10-gene predictive relapse score for TCGA patients and found that the model correctly classified 20 of the 22 patients who relapsed and 7 of the 10 who did not, with AUC = 0.78 (Figure 2C-D).
Finally, we sought to examine the association between expression of the genes in the 10-gene relapse signature and germline polymorphisms to identify expression quantitative trait loci (eQTLs) for these genes in AML. Using genotyping data from these patients, we tested for expression associations with single nucleotide polymorphisms (SNPs) in the same regions. Indeed, we found evidence for eQTLs in the JMY gene and 5′ of DEXI (Figure 3). In the JMY eQTL, the sentinel SNP, rs6414979, was common (global minor allele frequency, 0.37) and was strongly associated with JMY expression (P = 9.05 × 10−6). Likewise, the strongest associated SNP in the DEXI eQTL, rs3087876, was also common (global minor allele frequency, 0.45) and was associated with DEXI expression (P = 4.10 × 10−9) (supplemental Table 8).
eQTLs regional association plots. Plots show SNPs associated with expression of DEXI (A) and JMY (B). The top track indicates negative log10P values for associations between SNPs and expression of DEXI (A) or JMY (B). SNPs (represented by triangles) are colored according to linkage disequilibrium (LD), with the sentinel SNP (blue triangle) that showed the most significant association with expression. The middle track (green horizontal lines) shows the location and transcriptional direction of all coding genes in the displayed regions. The bottom track (blue lines) indicates genetic regions containing known regulatory elements annotated using the Ensembl database (microRNA target sites, promoters, enhancers, and ENCODE feature clusters that can be associated with transcription factor binding motifs). Plots were made using SNiPA, a SNPs annotator.
eQTLs regional association plots. Plots show SNPs associated with expression of DEXI (A) and JMY (B). The top track indicates negative log10P values for associations between SNPs and expression of DEXI (A) or JMY (B). SNPs (represented by triangles) are colored according to linkage disequilibrium (LD), with the sentinel SNP (blue triangle) that showed the most significant association with expression. The middle track (green horizontal lines) shows the location and transcriptional direction of all coding genes in the displayed regions. The bottom track (blue lines) indicates genetic regions containing known regulatory elements annotated using the Ensembl database (microRNA target sites, promoters, enhancers, and ENCODE feature clusters that can be associated with transcription factor binding motifs). Plots were made using SNiPA, a SNPs annotator.
Discussion
Our study identified a 10-gene expression signature present at the time of diagnosis that can predict relapse during first CR, independent from known prognostic markers in CN-AML. Although identification of molecular markers that predict outcome for adult patients with CN-AML treated with intensive chemotherapy is a relatively well-researched area, our study is unique in that we focused on gene expressions independent from known prognostic mutations. It was not surprising that, in our initial differential expression analysis comparing gene expressions between patients who relapsed and patients who maintained CR, clustering was driven by biallelic mutations in CEBPA and FLT3-ITDs, because these are known to be associated with outcome in CN-AML and have distinct expression profiles.5,32-34 Removing the genes differentially expressed between distinct CEBPA, FLT3-ITD, and NPM1 clusters allowed us to discover an expression signature that was independent from these known prognostic markers.
Early genome-wide investigations of gene expression in AML include work by Bullinger et al6 and Valk et al,5 who conducted seminal studies using microarrays that revealed the transcriptional heterogeneity between cytogenetic subsets of patients. These studies also offered the first insights into the relevance of transcriptional signatures for predicting patient outcome, by describing associations between expression-defined patient clusters and survival.
However, although gene-expression profiling is capable of providing prognostic information that is independent from other genetic risk factors,2-7 reproducibility issues have largely prevented its use in clinical practice. Limiting factors include lack of standardization of laboratory procedures and implementation of quality controls among various institutions, normalization and quantification of RNAseq data, and differences in probe content of microarrays. Recently, strides have been made to overcome these issues by implementing standard procedures for the use of commercially available tests suitable for clinical use in individual patients, which rely on highly reproducible multiplexed quantitative polymerase chain reactions assays or, less frequently, NanoString nCounter technology.35,36 Continued optimization and rigorous scrutiny of these methods may lead to routine use of RNA expression in some circumstances in the near future, similar to the currently accepted use of protein expression, as determined by immunohistochemistry, as diagnostic and predictive markers.
More recent work with RNAseq has been conducted to specifically identify coding and noncoding RNA signatures predictive of outcome in AML patients, including patients with CN-AML. The 10 genes that make up our predictive expression signature have not been included in any of the more notable gene expression signatures that are predictive of AML prognosis,5,6,8,9,37,38 including a long noncoding RNA signature described by our group.39 We speculate that this might be due to our exclusion of genes associated with biallelic CEBPA mutations, NPM1 mutations, and FLT3-ITDs.
Changes in the expression of 3 of the 10 genes constituting our gene expression signature (GAS6,40,41 PLCB4,42 and NRP143-45 ) have previously been shown to associate with outcomes of patients with AML in single-gene studies. The other coding genes in the 10-gene signature have been described to play roles in cancer as well. Although DEXI has not been studied in leukemogenesis, the calcium binding protein-encoding gene has been identified as a fusion partner of CIITA in CN-AML, suggesting DEXI is a particularly interesting candidate for future studies.46 JMY encodes a known cofactor of EP300, which serves as an activator of the tumor suppressor TP53.47 PSD3 expression is associated with breast cancer metastasis and glioma progression.48,49 The 3 noncoding genes in the signature have not been well characterized, but our results suggest that they merit further investigation.
Interestingly, our incorporation of genome-wide genotyping data revealed eQTLs regulating the expression, in AML cells, of 2 of the genes in the 10-gene signature: JMY and DEXI. These results imply that germline polymorphisms are at least one of the many factors that likely contribute to the expression of these genes, which are associated with an increased likelihood of disease relapse.
Our findings were validated using publicly available data from the TCGA,31 which, despite a relatively small number of patients, showed that the 10-gene signature was strongly predictive of relapse in adult CN-AML patients from this study. Although corroboration of our findings in another large set of patients with CN-AML is still desirable, we believe that addition of this signature to the current molecular prognostication guidelines, especially if expression of the genes constituting the novel signature we report herein can be assessed using a clinically suitable method, will allow more accurate prediction of relapse in CN-AML patients who have achieved a CR.
The data reported in this article have been deposited in the National Center for Biotechnology Information Gene Expression Omnibus database (accession GSE165430).
Data sharing requests should be sent to Christopher J. Walker (christopher.walker@osumc.edu).
Acknowledgments
The authors thank the patients who participated in clinical trials, Christopher Manring and the CALGB/Alliance Leukemia Tissue Bank at The Ohio State University Comprehensive Cancer Center for sample processing and storage services, and Lisa J. Sterling for data management.
This work was supported by National Cancer Institute, National Institutes of Health awards U10CA180821, U10CA180882, and U24CA196171 (Alliance for Clinical Trials in Oncology), U10CA180861, UG1CA233180, UG1CA233331, UG1CA233338, UG1CA233339, P30CA016058, and P50CA140158; the Leukemia Clinical Research Foundation; the Warren D. Brown Foundation; the Pelotonia Fellowship Program; and by an allocation of computing resources from The Ohio Supercomputer Center. It was also supported in part by funds from Novartis (CALGB 10603). Support to Alliance for Clinical Trials in Oncology and Alliance Foundation Trials programs is listed at https://acknowledgments.alliancefound.org.
The content of this article is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
This article is dedicated to celebrating the lives and accomplishments of Clara D. Bloomfield and Albert de la Chapelle.
Authorship
Contribution: C.J.W. and A.-K.E. conceived and designed the study; C.J.W., H.G.O., J.K., D.N., L.K.G., and M.B. performed bioinformatics and biostatistics analyses; D.P. assisted with RNAseq; K.M. assisted with manuscript writing and preparation and cytogenetics review; A.J.C. performed cytogenetics review; B.L.P., G.L.U., J.E.K., R.M.S., R.G., and J.C.B. treated patients and collected samples and clinical data; A.d.l.C. and C.D.B. supervised the project; and all authors read the manuscript and approved its final version.
Conflict-of-interest disclosure: C.J.W. has acted as a consultant for Vigeo Therapeutics, is employed by Karyopharm Therapeutics, and has ownership interests in Karyopharm Therapeutics and Bristol Myers Squibb. The remaining authors declare no competing financial interests.
Albert de la Chapelle died on 10 December 2020.
Clara D. Bloomfield died on 1 March 2020.
Correspondence: Christopher J. Walker, The Ohio State University Comprehensive Cancer Center, 460 West 12th Ave, Columbus, OH 43210-1228; e-mail: christopher.walker@osumc.edu.
References
Author notes
A.-K.E., A.d.l.C., and C.D.B. contributed equally to this work.
The full-text version of this article contains a data supplement.