Integration of gamma-retroviral (RV) and lentiviral (LV) vectors follows non-random patterns in mammalian genomes, with a common preference for active chromatin regions. The molecular basis of the interaction between retroviral pre-integration complexes and the human genome is poorly understood, particularly in the case of gamma-retroviruses. We have mapped a large number of RV and LV integrations in human CD34+ hematopoietic cells, transduced in vitro and analyzed without selection. Recurrent insertion sites (hot spots) account for >20% of the RV integration events, while they are significantly less frequent in the case of LV vectors. RV but not LV hot spots are highly enriched in proto-oncogenes, cancer-associated CIS, and growth-controlling genes. Genes involved in hematopoietic and immune system development are targeted at high frequency and enriched in hot spots, suggesting that the CD34+ gene expression program is instrumental in directing RV integration. To obtain information about the genetic determinants of retroviral integration preferences, we investigated the role of transcriptional regulatory element within the vector as well as the chromatin-specific determinants of the target sites in directing RV and LV integration. Comparative analysis of the “integromes” of vectors carrying different transcriptional regulatory elements provide information on the role of transcriptional complexes bound to genomic and viral sequences in targeting retroviral pre-integration complexes (PICs). Analysis of the chromatin characteristics around integration sites provides information about specific structures, protein components or epigenetic modifications that may favor the binding of retroviral PICs. Genomic sequences flanking the integration sites (±1 kb) were annotated and analyzed using TRANSFAC and JASPAR matrix database and UCSC track. This analysis shows that the content and arrangement of putative transcription factor binding sites around the insertion points is different between RV and LV vectors, and between vectors carrying different LTR modifications. The frequency, coverage and position of the major class of repeats in our datasets were also analyzed. Hyper Geometric Distribution analysis revealed a significant abundance of SINE and LINE repeats around the RV vector integration sites. These data indicate that vector design and the target cell gene expression program have a significant impact in determining the integration characteristics of retroviral vectors.

Author notes

Disclosure: No relevant conflicts of interest to declare