Sickle Cell Disease (SCD) is a genetic disorder caused by a single amino acid substitution in the ß-hemoglobin chain. Clinical manifestations of SCD are multisystemic and heterogeneous, with a wide range of organ damage between patients despite identical genetic mutations. The sub-phenotypes of SCD reflect damage to different organs arising from the pathophysiology of the disease. Biomarkers, such as N-terminal-pro-Brain natriuretic peptide (NT-proBNP), tricuspid regurgitant velocity (TRV), and hemolytic parameters provide readouts of the degree of different organ damage and are known risk markers. Dissecting its pathophysiology along known biological pathways has led to development of therapeutic targets, but we still do not fully understand the pathogenesis for much of the organ damage. Given the complexity of the pathophysiology of SCD that is likely to involve multiple overlapping and interacting biological pathways, an agnostic, network-based approach using gene expression data with annotated risk markers provides an attractive alternative. We assessed the plausibility of using such an approach with bilirubin levels as a phenotype. Bilirubin is a breakdown product of red blood cells (RBCs) that is measured in routine labs and a validated hemolytic and disease severity marker of SCD.


Using RNA-Seq data from 224 patients with SCD (80% HbSS and HbSBeta0 thalassemia combined, 15% HbSC, 5% HbSBeta+ thalassemia, and 0.4% HbSD), we performed a differential gene expression (DGE) analysis and weighted gene co-expression network analysis (WGCNA) of 12,450 genes. For the DGE analysis we used total bilirubin levels as a continuous variable for the main phenotype of interest, adjusting for age, sex, and hemoglobin genotype. We then conducted WGCNA to identify modules of genes with similar coexpression patterns, which we functionally annotated (via the ClusterProfiler R package) for interesting biological themes.


Of the 12,450 genes, 10,604 were protein coding, and 1,560 were lncRNAs, and 286 were other non-coding transcripts. 3,509 (2,975 coding, 447 lncRNA, and 87 other non-coding) were considered differentially expressed (unadjusted p-value < 0.05) according to bilirubin level. Through the WCGNA, the genes clustered into 11 co-expression modules, representing a range of biological pathways. Four of these modules had representative eigengenes that were significantly correlated with bilirubin (p < 0.05), and we focused on 2 of these that had biologically interesting functions.

The first module of interest (turquoise) consisted of 3,636 genes (3,371 protein-coding and 210 lncRNA) (25% differentially expressed, and 62% upregulated). Genes in this module have been associated with cellular functioning of younger RBCs, validating the presence of the significant Gene Ontology (GO) biological pathways listed in the bar graph (mitochondria gene expression, ribonucleotide processes, etc.), and aligning with expected processes as the young RBCs replace the degraded RBCs that had caused the high bilirubin. Notably, this module contained the biliverdin reductase A (BLVRA) gene (differentially expressed at p < 0.008), which reduces biliverdin to bilirubin.

The other highlighted module (red) consisted of 914 genes (846 protein-coding and 57 lncRNA) (42% differentially expressed, and 97% upregulated). Multiple pathways in the red module were related to catabolic processes. As previously stated, bilirubin is the byproduct of hemoprotein catabolism, suggesting an interesting correlation. Additionally, this module contained 2 other pathways associated with porphyrin, which is a component of RBCs that contains the heme group and is involved in heme biosynthesis.


Taken together, these results provide interesting insights into the biological pathways driving one facet of sickle cell disease's many clinical manifestations.


No relevant conflicts of interest to declare.

Author notes


Asterisk with author names denotes non-ASH members.