Fetal hemoglobin (HbF) is the major genetic modifier of clinical course of sickle cell anemia (homozygosity for HBB glu6val). HbF level is also an important predictor of mortality. If it were possible to know at birth the HbF level likely to be present after stabilization of this measurement at about age 5 years, then an improved prognosis might be given and HbF-inducing treatments better informed. Levels of HbF in adults are highly heritable and the production of HbF is genetically regulated by several quantitative trait loci and by genetic elements linked to the HBB gene cluster. One of the most popular approaches to genetic risk prediction uses a summary of the risk alleles in the form of a genetic risk score (GRS) that is used as a covariate of the genetic prediction model. We present the development of a GRS for HbF in 841 patients from the Cooperative Study of Sickle Cell Disease (CSSCD) cohort patients and assessed its ability to predict HbF values in three independent cohorts that included PUSH (N=77), Walk-PHaSST (N=181), and C-Data from the Comprehensive Sickle Cell Centers program (N= 127).
We used the results of a genome-wide association study (GWAS) of HbF in sickle cell anemia, in which patients were genotyped using the 610K Illumina array, and association of each of the ∼550K SNPs with HbF was tested using a linear regression model with gender adjusted additive genetic effects. To build the GRS, we sorted SNPs by increasing p-value, starting from the most significant SNP associated with HbF (rs766432, p-value=2.61×10−21), and pruned the list by removing SNPs in high LD (r2 > 0.8). We then used this list of SNPs to generate a sequence of nested GRS. We started with the GRS that included only the most significant SNP and generated the second GRS by adding the second SNP from the list of SNPs. The third GRS was generated by adding the 3rd SNP from the list of SNPs to the second GRS, and so on. We repeated this analysis including up to 10,000 SNPs (p-value< .02185) and hence generated 10,000 GRS, for each of the subjects in the CSSCD. Each of these GRS was included as covariate in a linear regression model and the regression coefficients of the resultant 10,000 linear regression models were estimated using Least Squares methods in the CSSCD data. The predictive value of these GRS models was then evaluated in three independent cohorts. In this evaluation, we computed the 10,000 GRS for each subjects in each data sets, and then used the 10,000 regression models estimated in the CSSCD data set to compute the expected HbF value of patients, given their GRS. We then assessed the predictive accuracy by computing the correlation between the observed and predicted values of HbF. To produce more stable predictions, we also created ensembles of predictive models. An ensemble of the first 14 GRS models including 14 SNPs had the best predictive value in all 3 data sets and explains 23.4% of the variability in HbF; the correlation between the predicted HbF and observed HbF was 0.44, 0.28 and 0.39 in the three different cohorts. Of these 14 SNPs, 6 were located in BCL11A; other SNPs were located in the olfactory receptor region and the in chromosome 11p15 and the site of the HBB gene cluster and were found previously to be associated with HbF.
We next compared these results to predictive models in which we included gender, coincident alpha thalassemia, and HBB haplotypes for prediction. The model including gender and alpha thalassemia explained only 2.6% of the variability of HbF in the discovery cohort and the model including HBB haplotypes explained 2.35% of the variability of HbF in the discovery cohort and neither model showed a significant correlation between the predicted and observed HbF in the three other cohorts. In addition, combining the non-genetic information with the GRS did not help to explain more of the variability in HbF.
With as few as 14 SNPs we can explain more of the variability in HbF and do a better job of prediction in comparison to using other non-genetic risk factors or genome-wide significant SNPs; however, we still cannot explain all of the variability in HbF that is due to heritability. These results suggest that knowing the genotype of a few SNPs can help to predict HbF that after they have stabilized. Prediction of HbF at an early age has the potential to help foretell some features of the severity of the clinical course of the disease and aid to optimize the clinical management of patients.
No relevant conflicts of interest to declare.
Asterisk with author names denotes non-ASH members.