Introduction: Elevated levels of fetal hemoglobin (HbF) are known to ameliorate both the morbidity and mortality of sickle cell anemia (SCA). Sustained post-natal HbF expression is heritable and regulated by multiple quantitative trait loci. Previous genomic studies have identified three major gene loci (BCL11A, HBS1L-MYB, and HBG2) that account for ~40% of HbF variation in SCA, but additional genetic modifiers remain to be discovered. We performed a genome wide association study (GWAS) using DNA collected from multiple cohorts of children with SCA, to identify novel genes and variants involved in HbF expression.
Methods: We analyzed genomic DNA from 1009 children with SCA and pre-treatment steady-state HbF levels who enrolled in prospective research trials from the United States (HUSTLE, SWiTCH, TWiTCH), the Caribbean (SACRED) or sub-Saharan Africa (REACH, NOHARM). Whole blood DNA was first genotyped using the H3Africa SNP array (Illumina) that identifies over 2.2 million single nucleotide variants (SNVs) across the genome. Most samples also underwent whole exome sequencing (WES) using NimbleGen VCRome 2.1 capture reagents and the Illumina HiSeq2500 platform analysis, which identifies coding variants in all known exons. Square root transformed HbF values were the continuous variable for association testing using single-locus mixed model (EMMAX) adjusted for population stratification, with both age and sex as co-variates. The GWAS approach included 3 distinct steps. First, we performed two independent GWAS discovery steps using distinct African populations; these were designated Discovery I (N=211) and Discovery II (223). Second, only SNVs that were significant (p<0.05) in both datasets were then selected for two independent replication steps; these were designated either African-American (N=157) or African (N=269). Third, the SNVs that were significant in both dual discovery and at least one of the replication cohorts were then verified using an additional Caribbean cohort (N=149) with TaqMan techniques for genotyping specific variants. Through this multistep process, we searched for genomic loci with consistent HbF associations across multiple cohorts.
Results: From the combined SNP and WES dataset, 8 BCL11A variants passed genome wide significance (p<10-8) in the discovery analysis, and 1,048 additional variants were identified with nominal HbF association (p<0.001). We found that 173 of these novel variants had sustained association in at least one of the replication cohorts (p<0.05). We selected 20 variants with the strongest and most consistent associations with HbF from the discovery and replication analyses for further verification (Table 1). Expected HbF associations with BCL11A (rs1427407) and HBS1L-MYB (rs4895441) were identified. Among other 18 novel candidate variants, the rs77737207 variant (allele frequency ~0.10) near the RUNX1T1 locus was strongly associated with lower HbF levels, while coding variant rs2279587 (allele frequency ~0.03) in the ITGA1 gene approached statistical significance (p<0.08) in the final verification cohort and was associated with higher levels of HbF.
Conclusions: Our large GWAS of HbF with diverse global cohorts of children with SCA from Africa, the United States, and the Caribbean validated the strong associations of HbF with common genetic variants near the BCL11A and HBS1L-MYB gene loci. We also identified two novel gene loci, ITGA1 and RUNX1T1, that have statistical associations with HbF expression. The RUNX1T1 gene is a broad transcriptional corepressor known to impact myeloid differentiation in hematopoiesis, while ITGA1 encodes the integrin alpha subunit of a cell-surface receptor involved in cell-cell adhesion and inflammation. Both of these genes represent novel loci that may be involved in the regulation of HbF expression in children with SCA and should be investigated further using cellular and animal models.
No relevant conflicts of interest to declare.
Asterisk with author names denotes non-ASH members.