Increasing use of hematopoietic stem cells for retroviral vector-mediated gene therapy and recent reports on leukemogenesis in mice and humans have created intense interest to characterize vector integrations on the genomic level. As techniques to determine insertion sites are more commonly applied in gene therapy laboratories there is a need to systematically collect and analyze the data arising from such studies in a vector insertion database. This will allow determining factors responsible for preferential integration of various vector types in specific chromosomal regions, genes or gene sections. The information derived from a vector insertion data base will be useful to recognize more “dangerous” vector types and may provide useful information for vector design.

We have set up an automatic sequence analysis tool (ensuring quality criteria e.g. verification of LTR- and adapter sequence, score >40, e-value >10e-40, hit RefSeq, next RefSeq etc.) which simplifies data input enormously while ensuring high quality standards. Our group is establishing the "collaborative RISC (retroviral insertion estimation into chromosome) -Score Database (CRSD)"- assessment project, based on the M-CHIPS (Multi-Conditional Hybridisation Intensity Processing System) microarray data warehouse and analysis software (K. Fellenberg et al. 2001, 2002). The data obtained from the sequence analysis tool were automatically fed in the data base. A total of 287 retroviral vector integration sites were isolated and sequence analysis was performed with the above describe analysis tool. In human bone marrow repopulating cells they occurred with significantly increased frequency into chromosomes 17 and 19 (n=189). Analysis of targeted RefSeq genes showed a favored integration (48%) within the first intron. In comparison, retroviral vector integrations in T-cells (n=98) showed an entirely different chromosomal distribution pattern while the percentage of the targeted RefSeq genes was similar (46%). Further, more than 1200 sequences were submitted to the data base, originating from different vectors (SF-MDR-, MoLV-based TK/neoR-Mo3TIN-, Moloney-MGMT-, Harvey-based Neo-, Harvey-based MDR-, and lentiviral GFP-SIN-vectors) and different transduced cells (mouse hematopoietic cells, mouse fibroblasts, rhesus hematopoietic cells, human hematopoietic cells, human T-cells). The set-up and internal structure of the data base will be presented.

Collaborations have been forged to include further groups and vector types.

Bioinformatical analysis will allow recognizing even complex vector integration patterns and will broaden our understanding for the determinants of vector integration into the genome. This in turn can lead to the construction of "favorable" vectors and help to reduce the genotoxicity of retroviral or lentiviral vector-mediated gene transfer.

Author notes

Corresponding author