Diagnostic utility of whole genome sequencing in adults with B-other acute lymphoblastic leukemia

Key Points • WGS stratifies 88% of B-other ALL in an established genetic subtype that was not possible to detect via cytogenetics.• Complex karyotype B-ALL emerges as a heterogeneous group of genetic subtypes, including MEF2D-r, DUX4-r, and IGK::BCL2.


Introduction
At diagnosis, the genetic classification of adult B-cell precursor acute lymphoblastic leukemia (BCP-ALL) is based on the detection of structural chromosome alterations and altered ploidy states, typically assessed via cytogenetic analysis, fluorescence in situ hybridization (FISH), or reverse transcriptase polymerase chain reaction. These diagnostic strategies inform treatment decisions through the identification of targetable lesions, such as BCR::ABL1, 1 or by assigning patients with high-risk genetic abnormalities, such as KMT2A fusion, low hypodiploidy, or complex karyotype to more intensive therapies, such as allogeneic stem cell transplant. 2,3 UKALL14 is a UK National Cancer Research Institute Adult ALL group study in which patients were stratified based on the Moorman cytogenetic risk classification. 2 The outcome for the primary, clinical randomized question has been published recently. 4 The study also aimed to better understand the relationship between the B-ALL cancer genome, clinical phenotype, and therapeutic response. Genetic profiling of 652 participants with BCP-ALL from the UKALL14 study using cytogenetics, FISH, and multiplex ligationdependent probe amplification (MLPA) helped confirm the high-risk nature of KMT2A-r, low hypodiploid, and complex karyotype and identified JAK-STAT abnormalities as a new high-risk genetic subgroup. 4 In unison, these profiling approaches assigned 70% of the cases to an established genetic subgroup, leaving 30% of the cases in the undefined and clinically heterogeneous B-other subgroup.
Recent comprehensive profiling approaches using whole transcriptome sequencing have been instrumental in the discovery of disease defining alterations in ALL. [5][6][7][8][9] In this study, we deploy retrospective whole genome sequencing (WGS) and whole transcriptome sequencing to characterize the genetic landscape of Bother adult ALL. We demonstrate that comprehensive genome profiling allows for the detection of all lesions reported by standardof-care (SoC) profiling and critically resolve genetic subtypes for the majority of patients with BCP-ALL for whom SoC failed to deliver a definitive diagnosis.

Patients and sample selection
The patients were treated in a randomized trial for adults (age, 25-65 years) with newly diagnosed ALL (UKALL14; ISRCTN66541317; #NCT01085617). Trial participation and correlative research were supported with written informed consent. The study received institutional review board approval. To identify patients with B-other ALL, we performed central review of genetic data collected through SoC testing (cytogenetics, FISH, and reverse transcriptase polymerase chain reaction) as well as research profiling, including FISH (CRLF2, JAK2, ABL1, ABL2, and PDGFRB) and MLPA (SALSA MLPA Probemix P335, MRC Holland). 10 A total of 58 cases satisfied the criteria for WGS: (a) absence of genomic drivers (supplemental Figure 1; supplemental Table 1); (b) availability of a diagnostic, pretreatment DNA; and (c) availability of germ line control DNA (minimal residual disease [MRD] <1% or buccal swab). Diagnostic RNA was available for 33 of the 58 cases (supplemental Table 2).

Sequencing and bioinformatic analysis
WGS was performed (2× 150 bp) to a target read depth of 60× and 30× for the tumor and healthy samples, respectively. RNA sequencing (RNA-seq) was performed using oligo deoxy-thymidine pulldown for a target coverage of 50 million reads (2 × 75 bp).
The WGS and RNA-seq data were aligned to GRCh37d5 using BWA-MEM (supplemental Table 3) and STAR version 2.5.0c (supplemental Table 4). WGS variant calling proceeded with 57 of 58 sequence complete pairs to determine somatic single nucleotide variants (SNVs), insertions/deletions, copy number aberrations, and structural variants (SVs). [11][12][13][14][15] Variants were annotated with OncoKb, to determine their putative role in cancer pathogenesis. 16 Data from 33 RNA-seq cases were analyzed to classify ALL subtypes using the consensus of any 2 of the ALLSorts, 17 ALLSpice, 18 and ALLCatchR (https://github.com/ThomasBeder/ALLCatchR) classifiers. Sample clustering with the City of Hope B-ALL reference gene expression data set from the MD-ALL R package (https:// github.com/gu-lab20/MD-ALL) was used to finalize subtype classification. RNA fusions were detected using FusionCatcher 19 and CICERO. 20 RNA mutations in PAX5 (R38H/C, P80R, and R140L) and ZEB2 (H1038R) were manually checked using Integrative Genomics Viewer. 21 Data integration, analysis execution, and visualization were conducted using the Isabl platform. 22 To detect enhancer hijacking events associated with the IGH locus, we used gGnome (https://github.com/mskilab/gGnome) to construct a graphical representation of the SVs detected in the tumor sample. For each graph, the gGnome "proximity" function was used to "walk" from the Eμ IGH superenhancer locus to genes in GENCODE version 29. 23,24 Candidate IGH walks <100 kbp were annotated using the Atlas of Genetics and Cytogenetics in Oncology and Haematology and Mitelman Database of Chromosome Aberrations and Gene Fusions in Cancer. 25,26 For DUX4r detection, we used the GRIDSS 27 SV caller, which allows for the identification of single breakends (SVs unambiguously anchored at only 1 locus). For each single breakend identified, BLAT was used to map the unplaced sequence to hg19. 28 Single breakend sequences mapped to chrUn_gl000228 (an unplaced hg19 contig with DZ4Z repeats containing DUX4) were manually reviewed using Integrative Genomics Viewer. 21 For further details please refer to supplemental Methods.

Mutational signatures
For single base substitutions (SBSs), mutation signature analysis was performed using the R/Bioconductor MutationalPatterns (version 3.5.6) package 29 using a 7-signature SBS mutational catalog as reference (supplemental Figure 3). 30 RAG-mediated deletions were annotated via motif analysis using Multiple Em for Motif Elicitation (MEME) https://meme-suite.org/meme/tools/meme. 31,32 Statistical analysis R version 3.6.3 (29 February 2020) was used; Mann-Whitney U test was used for between groups comparison of continuous variables using the wilcox.test function. 33 A robust linear model was used to calculate the per year SNV burden using the rlm function from the MASS R package, 34 and 95% confidence intervals using the confint.default function. 33

Results
Comprehensive profiling via WGS characterized arm-level and focal copy number alterations (CNAs), SVs, and acquired somatic and germ line mutations, mapping at least 1 aberrant somatic lesion in 52 of 57 B-other ALL samples in our cohort ( Figure 1A). The remaining 5 samples were excluded from subsequent analysis because of a low mutation burden (<100; supplemental Figures 4 and 5), suggesting a low tumor burden. Notably, all 5 low purity samples were erroneously considered, via cytogenetics, to have normal karyotypes (supplemental Table 6). This resulted in a set of 52 cases, of which 31 had RNA-seq data as well (supplemental Table 7).

Comparison of WGS and RNA-seq subtype allocation
To evaluate the diagnostic utility of WGS compared with that of RNA-seq, we compared the diagnostic findings in 31 cases using both WGS and RNA-seq data (supplemental Table 7). WGS analysis identified a disease defining genetic alteration in all 31 cases ( Figure 1C; supplemental Table 8). Of the 31 cases with RNA-seq data, 5 were classified as low RNA sample quality: 3 cases (PD37187, PD43260, and PD43262) with evidence of cross contamination (supplemental Table 7), and 2 cases (PD37188 and PD40803) identified as low purity that impaired gene expression-based classification (supplemental Figure 7A; supplemental Table 11). Estimation of the percent blast count via RNA and WGS correlated well, with the expected exception of the high hyperdiploid case (supplemental Figure 7B).
In 19 of the 31 cases, the underlying lesion contained genomic rearrangement. Evaluation of RNA-seq classification based on gene expression and fusion detection (supplemental Tables 11 and 12) correctly classified 19 cases (16 with both, 1 fusion only, and 2 classifiers only; supplemental Table 13). Fusion detection missed 2 IGH::DUX4 rearrangements. In a single case with P2RY8::CRLF2 (PD40808), RNA classification was split between Ph-like (ALL-CatchR and ALLSorts) and iAMP21 (City of Hope reference data set). In this case, WGS analysis only identified a single chr21 gain rather than a high-level amplification, 37 which was consistent with the diagnostic karyotype (50,XY,+21,+22 inc/46,XY).
In 3 cases (PD37187, PD43260, and PD43262), additional RNA fusions were not concordant with the consensus RNA classifier subtype (supplemental Table 12), which were the same samples identified using Somalier with cross contamination (supplemental Table 7).

Demographic, clinical, and genetic features of B-other genomic subtypes
Most DUX4-r cases harbored the canonical IGH::DUX4 rearrangement (7/8), but only 1 case had ERG::DUX4 (PD40815; supplemental Table 8). Only a single concurrent intragenic ERG deletion was identified (PD40800a; 21:39773785-39875948), which has been reported in up to two-thirds of cases with pediatric DUX4-r ALL 38 (supplemental Figure 9). Interestingly, 6 of 8 (75%) patients with DUX4-r were female compared with 17 of 44 in the study cohort(39%). Unlike prior observations in both pediatric and adult ALL, suggesting that DUX4-r was associated with favorable outcome, 9,39 only 1 of 8 were MRD-positive at the end of induction, and 5 of 8 relapsed (supplemental Figure 10).
Five cases had a ZNF384 fusion, including EP300::ZNF384 (n = 4) and AKAP8::ZNF384 (n = 1). We recently reported that patients with ZNF384 are typically younger and have good outcomes. 10 As expected, the same is true for patients in this subset. Four of 5 (80%) were aged ≤40 years compared with 18 of 47 (40%) for the rest of the cohort; only 1 of 5 were MRD-positive at the end of induction, and 2 of 5 relapsed (supplemental Figure 10). 10 WGS analysis identified 4 cases belonging to the ZEB2/CEBP group: 2 cases with ZEB2 H1038R and concurrent IGH::CEBPB rearrangements and 2 cases with ZEB2 H1038R alone. A fifth ZEB2 H1038R mutation was detected in an IGH::DUX4 case (PD40800; VAF =~0.20); however, RNA-seq classified this sample as DUX4-r, and the remaining 4 ZEB2 H1038Rs were all classified as ZEB2/CEBP (supplemental Tables 11 and 13).
We previously screened patients in this cohort for ABL-class fusions and JAK-STAT abnormalities using FISH and MLPA (supplemental Figure 1). However, WGS revealed 2 additional cases of Ph-like JAK-STAT abnormalities. The first had IGH:: EPOR, which is not detectable via FISH or MLPA; whereas, the second had P2RY8::CRLF2 fusion, which was validated via FISH and MLPA. 10  Four cases harbored newly described subtypes; two cases (PD40820 and PD40824) had a UBTF::ATXN7L3 fusion resulting from a submicroscopic interstitial deletion at 17q21.31. Both cases also harbored the 13q12.2 deletion, which hijacks the PAN3 enhancer to drive overexpression of CDX2 (supplemental Figure 15). 40,41 Two more cases (PD40809 and PD40814) harbored clonal IDH2 p.R140Q mutations (supplemental Figure 16), a subtype recently described by Yasuda et al. 41 Lastly, 13 cases involved the hijacking of an immunoglobulin gene enhancer (IGH x12 and IGK x1). In addition to the aforementioned DUX4-r (n = 7) and ZEB2/CEBP (n = 2) cases, WGS analysis revealed 4 additional cases of IGK::BCL2, IGH::CEBPA, IGH::ID4, and IGH::MIR125B1. The patient with IGK::BCL2 (PD40797) was classified to be at high-risk (age, >40 years; complex karyotype) and, although achieving remission after induction, responded poorly to therapy dying shortly after diagnosis (<200 days). The detection of this abnormality at the initial diagnosis might have prompted the re-evaluation of the diagnosis.

Detection of DUX4 rearrangements
Using our custom workflow, 8 rearrangements targeting the DUX4 locus were identified (supplemental Table 14). RNA-seq data were available for 7 of the 8 cases, allowing for the comparison of WGS with RNA fusion and RNA classification analyses ( Figure 4A). RNA classification identified all 7 cases, but RNA fusion gene analysis failed to detect 2 IGH::DUX4 cases (PD37188a and PD40804a). WGS showed evidence that in both cases the 5 ′ IGH locus translocation mapped to telomeric repeats, whereas the 3 ′ IGH translocation mapped to the DUX4 loci. Therefore, we can infer that, in these 2 cases, the DUX4 locus along with proximal telomeric repeats was inserted into the IGH locus in an inverted orientation, which confounds detection by RNA fusion calling ( Figure 4B). The ERG::DUX4 (PD40815a) case also showed expression of an alternative exon 6 (supplemental Figure 17), as previously described. 38 WGS analysis and DUX4 gene expression analysis, but not RNA fusion analysis, showed high specificity for the detection of DUX4-r. The implementation of WGS enables the detection of all DUX4 rearrangements and provides resolution on the structure of both the rearrangement and partner genes.

Detection of IGH enhancer rearrangements
Using the graph-based IGH enhancer hijack calling workflow, we reclassified 8 SV calls as candidate IGH events (supplemental Table 15). Partner genes included recurrent events in DUX4, EPOR, CEBPA, and CEBPB and the less common MIR125B1 and ID4. Seven IGH rearrangements were directly validated via FISH or RNA-seq fusion analysis, and 1 case (PD43259) had indirect evidence from FISH and karyotype, showing a 14q32 deletion correlating to the loss incurred due to an unbalanced translocation to form IGH::ID4 (supplemental Table 15; Figure 4C). Finally, in terms of the sensitivity of this approach, this workflow identified an IGH rearrangement that was validated via both RNA-seq and FISH in a case that failed to meet the purity criteria (20%) for WGS analysis (PD40837, IGH::CEBPA; supplemental Table 15).

Patterns of genomic instability in BCP-ALL
Assessment of the genome-wide patterns of mutations present in each leukemia genome allows for the characterization of putative biological processes or environmental exposures that result in genomic instability and contribute to leukemic transformation. 31,42 SV analysis identified B-cell-specific processes, such as RAGmediated deletions, 31 and observed complex SV events, such as chromothripsis. 43 RAG-mediated deletion were identified in all cases, with a median burden of 7 events (range, . A high burden of RAG-mediated deletions was observed in subtypes DUX4-r and ZNF384-r ( Figure 6A). The highest RAG-mediated deletion burden was identified in a single IGH::MIR125B1 case (PD37197), with 49 of 69 deletions attributable to RAG activity (supplemental Figure 18). The subtype with the lowest contribution to RAG-mediated deletions was MEF2D-r, which correlates with the previous observation that RAG1 is downregulated in MEF2D-r. 44 We also observed outlier high telomere length in MEF2D-r ( Figure 6A).
Analysis of SBS patterns for the established mutation signatures revealed that the majority of SNVs were mapped to the SBS blood signature ( Figure 6B-C). SBS blood is a clock-like signature operative in hematopoietic stem cells enriched in C>T mutations. 30 IKZF1   CDKN2A   CDKN2B   PD40807  PD40822  PD40811  PD40800  PD40815  PD40818  PD43260  PD43262  PD37193  PD37190  PD40801  PD40812  PD40819  PD40808  PD40816  PD40806  PD40797  PD37196  PD40814  PD40799  PD40804  PD37192  PD43259  PD40798  PD40805  PD40832  PD40820  PD40836  PD37187  PD40809  PD40825  PD37191  PD43266  PD37197  PD37194  PD40810  PD40824  PD40831  PD40803  PD40826  ~16 mutations per year of life (supplemental Figure 19). The only exception was an IGH::DUX4 case with homozygous loss of MSH6 that was defined by hypermutation (PD40812 with 77 183 SNVs) enriched in SBS1, a phenotype previously observed in hypermutator cases. 46,47 Although the sample mutational profile was atypical for SBS1 (Cosine similarity = 0.941), a much better fit was the thio-mismatch repair deficient signature (Cosine similarity = 0.996; supplemental Figure 20A-B). A signature observed in cases of relapsed pediatric hypermutator ALL has been recently attributed to those of mismatch repair-deficient ALL treated with thiopurine. 48,49 The patient had a history of Crohn disease, in which thiopurine administration is a common therapy. SBS9 is a signature dominated by T>G and T>C mutations attributed to replication errors by polymerase η during somatic hypermutation in lymphoid cells. 50 In our cohort, SBS9 was observed in 4 cases: 2 low hypodiploid cases, 1 IGH::CEBPB and 1 IGK::BCL2 case with the highest mutation burden (42%), which we had expected, given the association of this lesion with mature B-cell neoplasms. Lastly, SBS18, a signature dominated by C>A mutations and attributed to reactive oxygen species, was enriched in the MEF2D-r, PAX5, and IDH1/2 subtypes.
WGS assigned 88% (46/52) of the cases called B-other to an established genetic subtype of ALL, with~20% (10/46) of subtypes being assigned solely via the novel WGS workflows developed in this study. This included 5 cases with WHO 2016 subtypes; 3 of the 5 cases harbored high-risk genetic events that would have changed their UKALL14 risk group and postinduction treatment in the absence of other risk factors. Among the remaining B-other cases, 87% (41/47) were assigned to 1 of the newly described genetic subtypes of ALL. 9,39,41 Our recent studies have proposed that several of these subtypes are linked to a good (eg, ZNF384-r) or poor (eg, JAK-STAT) outcome. 10,39 Our findings demonstrate that unless cytogenetics reveals an established genetic rearrangement or ploidy subtype, a designation of failed, normal, or complex karyotype frequently misses subtype defining events that can readily be picked up by WGS, as was also demonstrated in our related studies on childhood ALL. 51,52 For example, the presence of a complex karyotype, defined as ≥5 chromosomal abnormalities, has been associated with a poor prognosis. 2,10 WGS identified a driver event in all 8 cases, revealing a heterogeneous spectrum of drivers linked both to favorable (DUX4-r) and poor (MEF2D-r and UBTF::ATXN7L3) outcomes. This suggests that a complex karyotype is not a robust classification.
The WGS also identified events that could alter clinical management. One IGH::DUX4 case involved a hypermutation caused by homozygous loss of MSH6, a candidate for checkpoint inhibition. A second case involved homozygous loss of CD58 and concurrent LOH of HLA-B, which would likely confer immune escape 53,54 and reduce the efficacy of chimeric antigen receptor T-cell therapy. 55,56 Lastly, a near triploidy case harbored a germ line TP53 mutation, which has implications for carrier screening.
Integration of RNA-seq for fusion gene detection and gene expression classification enables concomitant validation of WGS based findings using an orthogonal assay. This is of particular importance for transcriptomically defined phenocopy subtypes, such as Ph-like or PAX5alt, for which WGS alone would only be able to evidence previously defined recurrent DNA lesions. RNAseq can also directly identify fusions caused by complex SVs that are missed in standard WGS fusion calling.
Because this was a retrospective study, we used remission samples to source germ line DNA. To avoid tumor contamination, we selected samples with MRD-negative results or MRD <1%. Therefore, the observed genomic subtype frequencies may not reflect the true distribution in adult B-ALL. In this study, we focused on cases that did not harbor informative clinical biomarkers and were classified as B-other. We did not formally assess the performance of WGS and RNA-seq against SoC in patients for whom the ALL subtype was previously determined.
For clinical implementation, detailed laboratory analytical and clinical validity studies are warranted to delineate standardization metrics for diagnostic assay deployment and the optimal source of normal DNA to determine assay performance against SoC molecular diagnostic assays across the spectrum of B-ALL subtypes.