TO THE EDITOR:
B-cell acute lymphocytic leukemia (B-ALL) is the most common childhood malignancy and is a rare leukemia in adults.1-4 B-ALL subtypes are distinguished by characteristic structural variants and mutations, which can correlate with responses to treatment.2-5 Cytogenetic and genomic analyses combined with expression profiling have identified the existence of up to 23 subtypes.4,6 Subtype assignment can extend and refine the current standards of risk stratification, and current standard of care incorporates some molecular classification to identify patients at higher risk.7,8 For instance, detection of BCR-ABL1 (Philadelphia (Ph) chromosome) indicates high-risk disease, and treatment can be modified to include an ABL1-targeting tyrosine kinase inhibitor such as imatinib,3 and ETV6-RUNX1 fusions can indicate a lower risk of relapse.7-9 Next-generation sequencing of RNA (RNA-seq) has been used to identify fusion genes, quantify gene expression, and perform variant calling to identify driver mutations.9,10 Although gene expression quantification is particularly useful for identifying molecular subtypes, there is currently no publicly available software for subtype classification with RNA-seq.
Here we present ALLSorts: a B-ALL gene expression classifier that attributes samples to 18 subtypes previously defined by Gu et al.4 ALLSorts has a novel hierarchical design that offers broader group classifications if more specific subtypes cannot be ascertained. Additionally, ALLSorts can attribute multiple subtypes to samples.11 When applied to both pediatric and adult cohorts, ALLSorts demonstrated high accuracy and was able to classify previously undefined samples. ALLSorts is open source and publicly available at https://github.com/Oshlack/ALLSorts.
ALLSorts is a pretrained machine learning classifier that uses RNA-seq data to attribute B-ALL samples to 18 known subtypes. We developed ALLSorts by training a logistic regression classifier on a B-ALL dataset consisting of 1223 samples (supplemental Methods).4,6 ALLSorts uses an expression matrix for classification but can also accept FASTQ/FASTA or BAMs for conversion into this form. ALLSorts applies various processing steps to the data that are then input to a set of hierarchically organized logistic regression classifiers. Phenocopies are grouped into a meta-subtype with their mutational counterparts (Figure 1). These 5 meta-subtypes are as follows: ZNF384 group, KMT2A group, Ph group, ETV6-RUNX1 group, and high ploidy signature group (High Sig). The classifier first determines a sample's meta-subtype and then undertakes a more focused classification between the nested subtypes. The ZNF384-like and KMT2A-like subtypes contained too few training samples to confidently train a discriminator so default to their meta-subtype. This study was approved by the Royal Children's Hospital (RCH) Human Research Ethics Committee and the Peter Mac (PM) Human Research Ethics Committee and was performed in accordance with the Declaration of Helsinki.
The outputs from ALLSorts for each sample are the subtypes with predicted probabilities. There are also 2 visualizations for validation and exploration of unclassified samples. The first visualization shows the sample’s probability of being a subtype relative to the predefined subtype threshold (Figure 2A). The second visualization, termed waterfall plots, compares the maximum subtype probability for each sample to the probabilities of samples known to belong to that subtype (Figure 2B).
The trained classifier was first applied to held-out test sets from the training cohorts (supplemental Table 5). ALLSorts was found to have an overall accuracy of 92% (Figure 2C). However, classification performance was unbalanced between subtypes. The best performance was for subtypes with a small number of clearly defined features, which were often partners in fusion genes. The highest levels of misclassification occurred for the subtypes with larger collections of features, especially the High Sig group. However, falling back to meta-subtypes in these cases, results in high accuracy. For example High Sig meta-subtype can be used with an accuracy of 93%. In addition, both Ph/Ph-like and ETV6-RUNX1/ETV6-RUNX1–like saw misclassifications to their phenotypic counterparts (Figure 2C). These observations highlight the utility of the novel hierarchical architecture in providing important classifications that can be explored and validated with complementary analysis or assays.
To validate ALLSorts on independent data, we applied it to 195 samples across 2 cohorts of pediatric and adult B-ALL from the RCH and PM, which displayed clear batch effects (supplemental Figure 5). These datasets have some previously defined subtype classifications from various combinations of fusion calling, karyotyping, genomic sequencing, or gene expression classification with an earlier machine learning approach.9
The initial accuracy of the classifier was 79%, assuming that all previous subtypes were correct but not including 74 (38%) previously unclassified samples. However, ALLsorts was able to newly classify 61 (82%) of these (Figure 2D). Forty-six of these new classifications were evaluated to be plausible using fusion calling,12-14 karyotyping, and genomic sequencing for variant calling. Ten samples were reclassified to a new subtype, of which 8 matched the previous meta-subtype label. There were 15 (7.7%) previously labeled samples, which ALLSorts assigned as unclassified. Six of these had tumor purities of less than 10%.
A full list of samples that had new classifications is provided, with any causative variants found (supplemental Table 7). Of these 86 samples, 63% had a plausible explanation that the ALLSorts classification was correct at least to the meta-subtype level, 8% were incorrect, 20% remained ambiguous in terms of evidence supporting or dismissing plausibility of the call, and 9% were defined as having low tumor purity (less than 10%). We found high accuracy of classification for tumor purities above 20% (supplemental Figures 6 and 7).
One unique feature of ALLSorts is its ability to classify samples into more than 1 subtype. The training cohorts included 117 samples that were previously described as having multiple subtypes based on both gene expression analysis and cytogenetics. Without specifically training ALLSorts to recognize samples exhibiting multiple subtypes, these samples were used to investigate the capacity for multilabel classification.
We found the probability of getting at least a single subtype correct is 86.31%, and 90.5% if including meta-subtypes. However, we only predicted both subtypes 26% of the time (supplemental Table 4). This implies that multiple label classification with ALLSorts can add further value of a classifier with little cost in performance. In the future, as further manual labeling of multilabel samples becomes available, these multilabel subtypes could be explicitly trained for.
In this study, we present ALLSorts, a B-ALL subtype classification tool that can precisely attribute samples to 18 subtypes and 5 meta-subtypes according to their RNA-seq measurements. This tool has been trained and validated with a combined cohort of more than 2300 samples and is offered for public use through Github. One novel contribution of ALLSorts is a hierarchical architecture representing subtypes and their phenocopies within a meta-subtype. Additionally, ALLSorts can also classify samples into more than one subtype.
A key component of this study was testing the predictions of the software across validation cohorts to verify the robustness of the classifier. We found that the overall accuracy in the combined independent cohort was between 84% and 92% (supplemental Table 3). ALLSorts has the ability to retrain the classifier as more samples become available, which will allow classification of subtypes that currently have relatively low numbers of samples, such as BCL2/MYC. Although gene counts are clearly useful in determining the subtype, a more refined method that uses nuanced aspects of the data such as transcript quantification could provide increased performance. Complementary analysis methods such as fusion detection should be used in conjunction with ALLSorts for a broader picture. However, we clearly demonstrate that ALLSorts is capable of high classification accuracy across an extensive set of subtypes.
In summary, ALLSorts is an accurate, comprehensive, and freely available classification tool for determining subtypes of B-ALL.
Acknowledgments: Tumor samples and coded data were supplied by the Children’s Cancer Centre Tissue Bank at the Murdoch Children’s Research Institute and The Royal Children’s Hospital (www.mcri.edu.au/childrenscancercentretissuebank). Establishment and running of the Children’s Cancer Centre Tissue Bank is made possible through generous support by Cancer In Kids @ RCH, The Royal Children’s Hospital Foundation, and the Murdoch Children’s Research Institute. Raw gene expression counts for B-ALL tumor samples used for analysis in this study were obtained from St. Jude Cloud (https://www.stjude.cloud), which is a publicly accessible pediatric genomic data resource requiring approval for controlled data access.
The authors acknowledge the support of the SCOR Grant (7015-18) from the Lymphoma and Leukemia Society and of Perpetual Trustees and the Samuel Nissen Foundation.
This work was supported by grants from the Wilson Centre for Lymphoma Genomics and the Snowdome Foundation. This work was funded by National Health and Medical Research Council project grant GNT1140626.
Contribution: B.S. conceptualized the study, performed the formal analysis, created the methodology, provided software, visualized the study, wrote the original draft, and reviewed and edited the manuscript; A.O. conceptualized the study, supervised the study, created the methodology, visualized the study, wrote the original draft, and reviewed and edited the manuscript; N.M.D. conceptualized the study, supervised the study, created the methodology, wrote the original draft, and reviewed and edited the manuscript; L.M.B. provided clinical expertise and reviewed and edited the manuscript; G.L.R. provided sample and clinical expertise and reviewed and edited the manuscript; A.L. provided bioinformatics support and reviewed and edited the manuscript; H.J.K. provided biological expertise and reviewed and edited the manuscript; L.E.L. provided orthogonal clinical information and reviewed and edited the manuscript; I.J.M. provided biological expertise and reviewed and edited the manuscript; P.B. provided clinical expertise and reviewed and edited the manuscript; and P.G.E. provided clinical expertise and reviewed and edited the manuscript.
Conflict-of-interest disclosure: The authors declare no competing financial interests.
Correspondence: Alicia Oshlack, Computational Biology Program, Peter MacCallum Cancer Centre, Parkville, VIC 3000, Australia; e-mail: firstname.lastname@example.org.
Raw counts for 1988 samples from a recent St. Jude Children's Research Hospital study are available for public download through the St. Jude Cloud's visualization website (https://viz.stjude.cloud/st-jude-childrens-research-hospital/visualization/pax5-driven-subtypes-of-b-progenitor-acute-lymphoblastic-leukemia-genomepaint). Raw sequencing reads from 195 samples were obtained from Lilljebjörn et al6 (Lund, accession no. EGAD00001002112): 127 pediatric samples from the Children's Cancer Centre Tissue Bank at The Royal Children's Hospital (RCH), Melbourne, Australia (Brown et al9 ) and 68 adult samples from the Molecular Haematology Laboratory, Peter MacCallum Cancer Centre, Melbourne, Australia (PM). Counts data can be found here: https://github.com/Oshlack/ALLSorts/blob/master/counts/combined_raw-counts.csv.zip. Please contact the corresponding author for additional data sharing at email@example.com.
The full-text version of this article contains a data supplement.