The megakaryocytic transcription factor ARID3A suppresses leukemia pathogenesis

Up to 30% of patients with trisomy 21 develop transient abnormal myelopoiesis (TAM), and many progress to acute megakaryoblastic leukemia (AMKL), both in association with mutations in the GATA-1 transcription factor. In a Plenary Paper, Alejo-Valle et al demonstrate that miR-125b on chromosome 21 synergizes with mutant GATA-1 to promote leukemogenesis through the repression of ARID3A. Restoring ARID3A expression restores megakaryocytic differentiation and blocks the expansion of GATA-1mutant stem cells.

cDNAs were similarly cloned into the same SIN40C.SFFV backbone encoding GFP or dTomato as a fluorescent reporter. For inducible expression, a doxycycline-inducible SIN40C.TRE vector was used.
All plasmids generated in this study have been deposited at Addgene and are listed in Supplemental Table 1. shRNA positive selection screening 2 million Gata1s-FLCs were transduced with the miR-125b-mimicking shRNA library (MOI=0.3) to achieve sufficient representation (<500-fold of X shRNAs). Samples were harvested after 4 and 30 days in culture, and gDNA was extracted using the QIAamp DNA Blood Mini Kit (Qiagen). The shRNA amplicon was PCR-amplified using primers containing the p5 and p7 adaptor sequences, gel-purified and sequenced (single-end) on an Illumina HiSeq 2500. We used model-based analysis of genome-wide CRISPR/Cas9 knockout (MAGeCK) 8 to identify hits from the shRNA screen. Custom R scripts were used for demultiplexing double barcoded reads; guides with fewer than 20 reads in ≥75% of all samples were excluded. Raw read counts were passed to the mageck test command using default parameters.
Day 30 samples were compared to day 4 (n=12) to determine enrichment. Genes with a p-value <0.05 as determined by MAGeCK were deemed significant.

Proteomic analysis
Total cell lysis and Western blotting were performed as previously described 4 . Blots were developed using Amersham™ ECL Prime Western Blotting Detection Reagent (Thermo Fisher Scientific). CoIP of endogenous ARID3A was performed on CMK cells using Anti-ARID3A antibodies coupled to Novex™ DYNAL™ Dynabeads™ Protein G (Thermo Fisher Scientific). After cell lysis and affinity pulldown, proteins were eluted and subjected to proteolysis with trypsin (Promega) according to the filter-aided sample preparation (FASP) protocol 9 . Samples were analyzed by LC/MS/MS using a U3000 nano-HPLC system coupled to a Q-Exactive Plus mass spectrometer (Thermo Fisher Scientific). Raw data were processed using Proteome Discoverer 2.4 (Thermo Fisher Scientific). MS/MS data were searched against the Uniprot database (version Nov. 2019, tax. Homo sapiens, 73801 entries) 10 using Sequest-HT 11 . Label-free quantification of proteins was based on extracted peak areas of corresponding peptide precursor ions.

3'UTR binding assay
To evaluate binding of miR-125b to the 3'UTR of the ARID3A/Arid3a mRNA, we cloned the 3'UTRs containing the miR-125b binding sites into a SIN40C.EFS.eGFP.pre lentiviral backbone downstream of the eGFP cassette. A 3'UTR with 3 exchanged nucleotides in the predicted miR-125b binding sites was used as mutated control. After lentiviral transduction, HEL cells were FACS-sorted and transduced with miR-125b or a miR-control. Knockdown efficiency was defined as previously described 12 .

Flow cytometry and sorting
Flow cytometry was performed on a CytoFLEX flow cytometer (Beckman Coulter) and the data were analyzed in Kaluza 1.5 (Beckman Coulter). Antibodies are listed in Supplemental Table 1. Cell sorting was performed on a BD FACSAria™ II Flow Cytometer (BD Biosciences). Apoptosis was measured using the Annexin V Apoptosis Detection Kit II with APC-Annexin V (BD Biosciences) or PE-Cy7-Annexin V kit (Thermo Fisher Scientific). Cell cycle analysis was performed using the BrdU Flow Kit (BD Biosciences) and Alexa Fluor 647 Anti-BrdU (BD Biosciences) or PE-Cy7 anti-BrdU (Biolegend). Both assays were performed according to manufacturers' instructions.
Gene expression profiling sgLuc-transduced FLCs were FACS-sorted three days after transduction; transduced CMK or Gata1s-FLCs were FACS-sorted 2, 10 and 13 days after doxycycline induction (0.5µg/mL). RNA was prepared using the Quick-RNA™ Miniprep Kit (Zymo Research). RNA-Sequencing was performed by Novogene Company, Ltd. A minimum amount of 150ng RNA was used as input material for the RNA sample preparations. Sequencing libraries were generated using NEBNext® UltraTM RNA Library Prep Kit for Illumina® (New England Biolabs) and sequenced on an Illumina NovaSeq using a paired-end 150bp system. Raw FASTQ data (raw reads) were first pre-processed using fastp 13 and then further processed as previously described 14 . Differential expression analysis was performed using the DESeq2 package in R 15 . The resulting P values were adjusted using the Benjamini and Hochberg's approach for controlling the False Discovery Rate (FDR) 16 . Genes with an adjusted P value <0.05 were considered differentially expressed. Functional enrichments were calculated via gene set enrichment analysis (GSEA; v4.0) 17 using previously described gene sets and curated ML-DS signatures 7 . Human gene symbols were mapped to murine gene symbols using orthologue annotations provided by Ensembl 18 , considering only one-to-one orthologue relationships.
To quantify miR-125b expression levels, Gata1s-FLCs were FACS-sorted 72 hours after transduction or 48h post doxycycline induction. RNA was prepared using the Quick-RNA™ Miniprep Kit (Zymo Research), and TaqMan Advanced miRNA Assays were performed using the TaqMan™ MicroRNA Reverse Transcription Kit and TaqMan™ Universal PCR-Mastermix (Thermo Fisher Scientific) with primers specific for miR-125b (Assay ID #000449) and U6 (Assay ID #000435).

ATAC-Seq
We performed assay for transposase accessible chromatin sequencing (ATAC-seq) as previously described 30,31 . 50,000 miR-125b-Gata1s-FLCs expressing doxycycline-inducible Arid3a-FLAG or LUC cDNA were FACS-purified after 48 hours of doxycycline induction and processed using the Illumina Tagment DNA Enzyme and Buffer Kit (Illumina). The resulting libraries were sequenced by Novogene Company, Ltd. on an Illumina NovaSeq 6000 (150 bp paired end reads). The data processing was also performed by Novogene: in brief, raw reads were trimmed and filtered using Skewer 32 and clean reads were aligned to mm10 with BWA 33 . Mitochondrial reads were removed prior to subsequent analysis.
Normalized pileups were generated using deepTools 25 and viewed in the Integrated Genomics Viewer (IGV) 26 .

Patient survival analysis
Event-free survival (EFS) was defined as time from diagnosis to the first event or last follow-up. Events were death from any cause, failure to achieve remission, relapse, and secondary malignancy. Failure to achieve remission was considered as an event on day 0. Overall survival was defined as the time between diagnosis and death from any cause or last follow-up. The Kaplan-Meier method was used to estimate survival rates 34 . Differences were compared using the 2-sided log-rank test 35 , and standard errors were obtained using the Greenwood formula. The DESeq2 package was used to normalize and variance-stabilize RNA-sequencing read count data 15 . The pediatric AML data set further required batch correction, for which we used the sva package 36 . Normalized (and batch-corrected) expression of ARID3A was taken as a continuous variable in the survival model. For patient stratification, the optimal cutoff point was determined using maximally selected log-rank statistics as implemented in the maxstat R package (http://cran.r-project.org/web/packages/maxstat/index.html). The calculated cutoff for EFS was used for both overall survival and EFS analyses. We relied on R Version 3.6.1 (http://www.r-project.org/) for all of the above computations. Multivariate analysis was performed using the Cox proportional hazards model 37 and SAS 9.4 was used to compute hazard ratios and 95% CIs of the relative risk for the respective prognostic factors (ARID3A expression, dataset, cytogenetic risk group, gender, age, white blood cell count [WBC]).

Supplemental Tables
Supplemental Table 1: List of reagents and resources Supplemental  Table 3: Targets and sequences of the miR-125b-mimic shRNA library Supplemental Table 4: Enrichment scores from the shRNA-based positive selection screen Supplemental Table 5: GSEA results of global gene expression profiling after overexpression of miR-125b in Gata1s-FLCs Supplemental Table 6: List of shRNA targeting ARID3A/Arid3a Supplemental Table 7: GSEA results of global gene expression profiling after modulation of Arid3a in Gata1s-FLCs Supplemental Table 8: Pairwise analysis of LC-MS/MS Supplemental Table 9: GSEA results of global gene expression profiling after overexpression of ARID3A in CMK Supplemental

Supplemental Figures
Supplemental Figure 1