Background: Umbilical cord blood transplantation (UCBT) is a potentially curative therapy acute leukemia (AL) patients. Transplantation benefit must be balanced against risks, such as transplant related mortality and relapse. The complex nature of hematopoietic stem cell transplantation data (HCT), rich in interactions and possibly nonlinear associations, has motivated us to apply machine learning (ML) for predictive modeling. ML is a field of artificial intelligence and is part of the data mining approach for data analysis.

Our group has recently reported on a ML based prediction model for short term HCT outcomes (Shouval R et al; JCO 2015). Using a ML algorithm, the perspective of the current study was prediction of leukemia free survival (LFS) at 2 years after an UCBT, while exploring variables' importance and interactions.

Patients & Methods: A cohort of 3,149 UCBT were analyzed. Inclusion criteria encompassed patients at all ages, undergoing an UCBT (single/double unit) in EBMT centers from the year 2004 to 2014, for AL, in all disease status. All conditioning and graft versus host disease prophylaxis regiments were included. A total of 24 variables were considered, including the number of total nucleated cell dose (TNC), donor and recipients HLA typing, as well as recipient, disease and transplant characteristics.

The Random Survival Forest (RSF) ML algorithm was applied for model construction and data exploration. RSF is known to be adaptive to data, is able to automatically recover nonlinear effects and complex interactions among variables, and yields nonparametric prediction over test data. The analysis pipeline consisted of prediction model development, assessment of variable importance by their minimal depth from the tree trunk, and exploration of the top ranking variable with dependence plots. The latter promotes understanding of non-trivial associations between variables and outcomes.

Results : The 2 years LFS was 49%, with a median follow up of 30 months. A RSF model of 1000 trees was developed, with each tree constructed on a bootstrap sample from the original cohort. A prediction error of 36.0% was calculated. The 10 most predictive variables (in ascending order) were disease status, age, TNC harvested and infused, recipient CMV serostatus, interval from diagnosis to UCBT, transplant year, previous autologous transplant, and use of anti-thymocyte globulin (ATG).

Selected findings from exploration of variables-outcome relationship with dependence plots included a varying effect of TNCs in specific subpopulations. Increasing the number of infused TNCs had a positive effect on predicted LFS in patients receiving HLA mismatched (2 or more HLA mismatch) (figure) or single unit CB grafts, and patients in earlier disease status or older age. ATG administration was associated with worse LFS, whether unadjusted or adjusted to all other variables. However, there was an additional negative effect in advanced disease status patients, recipients of HLA mismatched or single CB units grafts, and older patients. Patients in 1st complete remission (CR) had higher predicted LFS as compared to those in 2nd CR. However, in patients receiving a HLA mismatched or a double CB graft, the difference in LFS between CR1 and CR2 was attenuated. Younger age had a favorable impact in early disease status, but lost its positive effect in advanced disease.

Conclusions: A prediction model for LFS 2 years post UBCT was developed using the RSF ML algorithm. Variables were ranked according to their predictive contribution. Disease status, age, and TNC count were found to be the most important factors. Dependence plots revealed interactions and nonlinear associations between variables and the outcome, such as the effect of cell dose on HLA disparity. Apart from the study's clinical findings, it carries a methodological significance. A novel ML approach for prediction, variable selection and data exploration, accounting for long term time to event outcomes, has proved useful in the field of HCT.

Figure 1.

Variable marginal dependence coplot of predicted LFS at 2 years against TNC, conditional on HLA matching. Individual cases are marked with blue circles (alive or censored) and red `x's (event). Linear smooth (a linear extrapolation of the prediction function), with shaded 95% confidence band, indicates trends of variable dependence.

Figure 1.

Variable marginal dependence coplot of predicted LFS at 2 years against TNC, conditional on HLA matching. Individual cases are marked with blue circles (alive or censored) and red `x's (event). Linear smooth (a linear extrapolation of the prediction function), with shaded 95% confidence band, indicates trends of variable dependence.

Close modal
Disclosures

Mohty:Janssen: Honoraria; Celgene: Honoraria. Sanz:JANSSEN CILAG: Honoraria, Research Funding, Speakers Bureau. Bader:Neovii: Other: Institutional grants; Medac: Other: Institutional grants; Riemser: Other: Institutional grants; Amgen: Consultancy; Novartis: Consultancy; Jazz Pharmaceuticals: Consultancy.

Author notes

*

Asterisk with author names denotes non-ASH members.