Erythroid transcription factors (TFs) control gene expression programs, lineage decisions, and disease outcomes. How transcription factors contact DNA has been studied extensively in vitro, but in vivo binding characteristics are less well understood as they are influenced in a reciprocal manner by chromatin accessibility and neighboring transcription factors. Here, we present a comparative analysis approach that takes advantage of non-coding sequence variation between functionally equivalent erythroid cell lines to conduct an in-depth analysis of erythroid TF binding profiles and chromatin features.

Specifically, we analyzed ChIP-seq datasets to identify millions of genetic non-coding variants between the mouse erythroleukemia cell line (MEL), a GATA1-inducible erythroid progenitor cell line (G1E-ER4), and primary murine erythroblast cells. We found that while these cell lines are highly positively correlated in chromatin features, larger differences in TF binding intensity are correlated with higher degrees of genetic variation between cell lines.

We next examined discriminatory genetic variants between the cell lines that are located in ChIP-seq peaks of the erythroid transcription factor GATA1. Hundreds of such variants fall within GATA1 motifs. Differential GATA1 binding intensities associated with the variants revealed nucleotide positions that contribute most to in vivo GATA1 chromatin occupancy and identified which alternative nucleotides are most likely to disrupt binding. Notably, this additional information about GATA1's in vivo nucleotide binding preferences improved prediction of GATA1 binding sites genome-wide. We applied similar approaches to determine the bp-resolution in vivo binding preferences of TAL1/SCL and CTCF.

We additionally identified thousands of discriminatory genetic variants within GATA1 sites that fall outside canonical GATA elements but within binding sites of other known TFs. Association of these variants with differential GATA1 binding intensities revealed that the hematopoietic transcription factors TAL1/SCL and KLF1 positively regulate GATA1 chromatin occupancy. Strikingly, we identified a number of motifs not previously implicated in cooperating with GATA1 that positively impact GATA1 chromatin binding. Notably, we also defined motifs associated with negative regulation of GATA1 chromatin occupancy. Applying a similar analysis to TAL1/SCL and CTCF revealed additional motifs involved in regulating the chromatin occupancy of these TFs.

Finally, we associated discriminatory genetic variation between erythroid cell lines with large changes in sub-kb-scale DNase hypersensitivity. We found that single base pair substitutions within or near a number of erythroid TF motifs, including that for the RUNX family of nuclear factors, are strongly associated with changes in chromatin accessibility. Our findings use novel methods in comparative ChIP-seq and DNase-seq analysis to reveal new insights about the genetic basis for erythroid TF chromatin occupancy and chromatin accessibility.


No relevant conflicts of interest to declare.

Author notes


Asterisk with author names denotes non-ASH members.