Stroke is a major vascular complication of sickle cell anemia, more frequent in patients under the age of 20 years. Transcranial Doppler (TCD) flow studies can predict the likelihood of stroke in children with sickle cell anemia, but only 10% of individuals with abnormal TCD values will have stroke, and stroke will occur in some individuals with normal TCD. Therefore, more precise means of prognosis would be welcome so that prophylactic treatments like transfusions or hydroxyurea can be targeted to individuals at highest risk. Using Bayesian networks (BNs), we analyzed 235 single nucleotide polymorphisms (SNPs) in 80 candidate genes in 1398 unrelated subjects with sickle cell anemia enrolled in the Cooperative Study of Sickle Cell Disease. Bayesian networks are a novel generation of multivariate models that represent the complex structure of interactions between many variables by a network of interrelated modules. These modules can be learned from data and then can be used to describe how changes in some variables affect other variables and ultimately the risk for the phenotype of interest. Bayesian networks provide a coherent framework within which genotypes, phenotypes and environmental factors can be seamlessly integrated into a comprehensive genomic landscape. We found that 25 SNPs on 11 genes and 4 clinical variables - including α thalassemia and fetal hemoglobin - interact in a complex network of dependency to modulate the risk of stroke. This network of interactions includes three genes (TGFBR2, TGFBR3, BMP6) with a functional role in the TGF-beta pathway and one gene (SELP) already associated with stroke in the general population. We validated our results in a different population by predicting the occurrence of stroke in a set of 114 subjects not included in the original study: 7 stroke patients and 107 control patients, a proportion consistent with the phenotype distribution in the original cohort study. Our model predicted the correct outcome for all 7 stroke patients, and for 105 of 107 non-stroke patients, with 100% true positive rate and 98.14% true negative rate, and an overall predictive accuracy of 98.2%. Our results support the hypothesis that stroke in sickle cell anemia patients is a complex trait caused by the interaction of multiple genes, and the predictive accuracy of our model is a step toward the development of additional prognostic tests to allow us to more precisely identify sickle cell anemia patients at risk for stroke. The presence in our model of genes already associated with stroke, such as SELP, suggests that some genetic factors predisposing to stroke are shared by both sickle cell anemia patients and stroke victims in the general population, and that our model may offer some insights into the genetic basis of the third leading cause of death in the United States.

Author notes

Corresponding author