Therapy-related myelodysplasia (t-MDS) is a lethal complication of autologous hematopoietic cell transplant (HCT) for Hodgkin lymphoma (HL) and non-Hodgkin lymphoma (NHL). The development of t-MDS after HCT appears related to pre-HCT genotoxic exposures. Here we investigated whether alterations in gene expression patterns in hematopoietic stem cells (HSC) from peripheral blood stem cell (PBSC) autografts was associated with subsequent development of t-MDS after HCT. We analyzed pre-HCT PBSC samples from 18 patients who developed t-MDS and 38 controls (matched for primary diagnosis, age at HCT, race/ ethnicity, and length of follow-up) that did not develop t-MDS after HCT for HL/NHL. CD34+ cells were selected using flow cytometry. RNA was extracted from 1000 cells, processed using the Affymetrix 2-Cycle Target labeling kit and hybridized on Affymetrix U133 Plus 2.0 microarrays. Following QC assessment, 16 t-MDS and 29 matched controls were selected for further analysis. Raw data were normalized using a RMA algorithm. 35042 transcripts were selected for analysis based on ≥4 arrays having intensity >16. Limma package was used to identify genes differentially expressed between t-MDS and control samples (FDR ≤ 0.01; and > 1.5-fold up or down-regulation) while controlling for matched groups. 877 differentially expressed transcripts representing 781 unique genes were identified. Of several classification algorithms tested, the K nearest neighbor (KNN) and Naïve Bayes (NB) were found to best predict t-MDS. To select the best genes for classification, a step forward method was used for prediction analysis. First, redundant transcripts were removed with only the most significant one of the redundant transcripts being kept and genes were ranked based on p-value. Prediction analyses were performed by starting with top 2 genes and incrementing one more gene at a time. Using this approach the top 5 genes best predicted the class labels using leave-one-out (LOO) cross validation with a prediction error of 22%. To determine the statistical significance of the prediction error rate, a permutation based approach was implemented to establish the null distribution of the error rate. This determines the probability of obtaining a cross-validated classification error as small as would be achieved if there were no difference between t-MDS and control. Sample labels were randomly permuted 10,000 times and cross-validated error rates were recalculated using LOO cross validation. Based on the null distribution of the permutated error rates, the original error rate (ER0=0.22) had a P value of 0.005, indicating that the likelihood of obtaining such a small prediction error for a cross-validated classifier by chance is very low. We are currently analyzing the differentially expressed genes to investigate for altered response to genotoxic stress and stem cell regulation in HSC from patients who subsequently develop t-MDS. In conclusion, we have shown that gene expression profiles of HSC from PBSC autografts from HL/NHL patients can differentiate patients who develop t-MDS after autologous HCT from those who do not develop t-MDS. The prediction power of these gene sets will be further verified using different sets of t-MDS case and control samples.

Author notes

Disclosure: No relevant conflicts of interest to declare.