Key Points

  • Prognostic capacity varied across 8 allogeneic transplantation scores, with rPAM showing modest benefit across several outcomes.

  • EASIx, a biomarker-based prediction model, is among the strongest predictive scores of NRM.

Abstract

Clinical decisions in allogeneic hematopoietic stem cell transplantation (allo-HSCT) are supported by the use of prognostic scores for outcome prediction. Scores vary in their features and in the composition of development cohorts. We sought to externally validate and compare the performance of 8 commonly applied scoring systems on a cohort of allo-HSCT recipients. Among 528 patients studied, acute myeloid leukemia was the leading transplant indication (44%) and 46% of patients had a matched sibling donor. Most models successfully grouped patients into higher and lower risk strata, supporting their use for risk classification. However, discrimination varied (2-year overall survival area under the receiver operating characteristic curve [AUC]: revised Pretransplantation Assessment of Mortality [rPAM], 0.64; PAM, 0.63; revised Disease Risk Index [rDRI], 0.62; Endothelial Activation and Stress Index [EASIx], 0.60; combined European Society for Blood and Marrow Transplantation [EBMT]/Hematopoietic Cell Transplantation-specific Comorbidity Index [HCT-CI], 0.58; EBMT, 0.58; Comorbidity-Age, 0.58; HCT-CI, 0.55); AUC ranges from 0.5 (random) to 1.0 (perfect prediction). rPAM and PAM, which had the greatest predictive capacity across all outcomes, are comprehensive models including patient, disease, and transplantation information. Interestingly, EASIx, a biomarker-driven model, had comparable performance for nonrelapse mortality (NRM; 2-year AUC, 0.65) but no predictive value for relapse (2-year AUC, 0.53). Overall, allo-HSCT prognostic systems may be useful for risk stratification, but individual prediction remains a challenge, as reflected by the scores’ limited discriminative capacity.

Introduction

Given the potential benefits and perils associated with allogeneic hematopoietic stem cell transplantation (HSCT), informed risk estimation is an integral part of candidate evaluation. The past 20 years have seen the proliferation of risk indices for the prediction of HSCT outcomes. These models can be useful for patient counseling, treatment strategy optimization, and statistical analysis across cohorts.1,2  Scores are based on a variety of different sets of parameters. The Hematopoietic Cell Transplantation–specific Comorbidity Index (HCT-CI)3  and its derivative Comorbidity-Age Index4  are based on the patient’s comorbidity profile. The score of the European Society for Blood and Marrow Transplantation (EBMT)5  includes characteristics of the patient (age), disease (status, time from transplantation), and donor (relation, donor-recipient HLA match, and sex match). The Comorbidity-EBMT index, proposed by Barba et al in 2014, combines the comorbidity-specific information from the HCT-CI with the broader range of values included in the EBMT score.6  In 2006, the Pretransplantation Assessment of Mortality (PAM) score7  was published with some features that overlap with the EBMT score (it also includes information on conditioning and laboratory markers of comorbidities); this score was simplified in 2015 (revised PAM [rPAM]),8  leaving only age, donor type, disease status, and pulmonary function, while adding donor and recipient cytomegalovirus (CMV) serostatus. Most recently, the Endothelial Activation and Stress Index (EASIx),9  a laboratory biomarker-based formula including serum creatinine, lactate dehydrogenase, and platelet count, was developed for the prediction of survival in patients developing acute graft-versus-host disease; this score has been extended into the general prediction of mortality when measured pretransplantation.10  Another prognostic tool commonly used for risk stratification is the revised Disease Risk Index (rDRI), which incorporates disease type and status at time of transplantation.11  Supplemental Table 1 describes each score’s components. Data regarding the comparative performance of these scores in the same population are lacking. We aimed to externally validate and compare the performance of these 8 systems in a contemporary cohort of transplantation patients across several outcomes.

Methods

Data collection

Clinical and laboratory data prior to transplantation were obtained from the electronic medical record for allogeneic transplantations performed between 2011 and 2015 at Chaim Sheba Medical Center at Tel HaShomer, Ramat Gan, Israel. Outcomes were cross-referenced with the national social security registry for survival of any patients lost to follow-up. We included adult patients (age ≥18 years) who underwent transplantation for any indication and who received grafts from matched sibling (MSD), matched unrelated (MUD; 10 of 10 HLA alleles), or mismatched unrelated (9 of 10 HLA alleles) donors. Patients missing the parameters necessary for calculation of all studied scores were excluded from the analysis, with the following exceptions: 38 patients with missing patient-donor CMV serostatus pair were included in the overall study but excluded from the analysis of the rPAM score; for 17 patients missing time from diagnosis to transplantation, a component of the EBMT score, this value was imputed using the median value of all patients with the same diagnosis and disease stage. The rDRI was inapplicable to 18 patients treated for nonmalignant conditions who were excluded from the assessment of that system. Conditioning regimens were deemed to be myeloablative or reduced-intensity based on the definitions of Bacigalupo et al,12  with treosulfan-based regimens also considered myeloablative.13  Comorbidities were measured by a qualified transplant physician using definitions provided by the HCT-CI.3  Human subject research was approved by the Chaim Sheba Medical Center Institutional Review Board, and all research was performed in accordance with the Declaration of Helsinki.

Statistical analysis

All outcomes were measured from the time of HSCT. Nonrelapse mortality (NRM) was defined as death without the competing event of relapse following HSCT. Time of relapse was determined by a clinical finding of recurrent disease.

Prognostic scores were calculated for each patient using the definitions provided in the publications of these prognostic indices: the HCT-CI,3  the Comorbidity-Age Index (Comorbidity-Age),4  the combined HCT-CI and EBMT (Comorbidity-EBMT),6  the EASIx,9  the PAM7  and rPAM8  scores, and the EBMT5  score (supplemental Table 1). For determination of disease status in calculation of the PAM score, we used the definition provided by the EBMT score similar to the example set by Barba et al.14  Disease status in the rPAM score was determined using the categories of the rDRI,11  with nonmalignant diagnoses assigned an intermediate disease stage, as specified in the rPAM’s initial publication by Au et al.8  Scores were tested for normal distribution in the population using the Shapiro-Wilk test, and Pearson product-moment correlation coefficients were calculated between each pair of scores. For a subanalysis including only patients who received transplants for acute leukemia, an additional score, the Acute Leukemia–EBMT (AL-EBMT; developed using a machine learning technique), was calculated as well, following the schema outlined by Shouval et al.15 

The scores were grouped into 3 to 6 levels each for estimating overall survival (OS), NRM, and relapse incidence using the Kaplan-Meier and cumulative incidence methods and compared using the log-rank and Gray tests. Multivariable regression, adjusted for age, donor type, conditioning intensity, and year, was performed only for the HCT-CI and EASIx scores because these covariates were themselves components of the remaining scores. Additionally, a multivariable model was built separately for the rDRI, adjusting for the same covariates other than disease risk.

Score discrimination was measured using the area under the receiver operating characteristic curve (AUC). Discrimination reflects the ability of a prediction model to differentiate between those who do and do not experience the studied outcome. Perfect discrimination corresponds with an AUC of 1.0, meaning that the predicted risk for all individuals who developed the outcome is higher than that for all individuals who did not experience the outcome. An AUC of 0.5 is indicative of a random predictor, that is, a coin toss.16  AUCs were calculated across the entire cohort for the prediction of OS, NRM, and relapse incidence at 100-day and 1-, 2-, and 3-year time points in each of the scores independently. AUCs were further validated using a bootstrapping technique with 100 samples, with median and interquartile range (IQR) reported in the supplemental Appendix. Calibration, the agreement between prediction and observed outcome, was assessed graphically by plotting predicted vs observed outcomes for score quartiles at the 100-day and 1- and 2-year time points. A model is considered well calibrated if, for example, among a group of 100 patients with a mean predicted risk of 20%, ∼20 patients develop the outcome.17  Finally, within a variety of subpopulations, discrimination was assessed to determine whether each score performed better or worse under a given set of conditions.

Statistical analyses were performed using R version 3.4.3 (R Foundation for Statistical Computing) and the packages “survival,” “cmprsk,” “prodlim,” “pec,” “rms,” and “ggplot2.”

Results

Population characteristics

Population characteristics are provided in Table 1. A total of 528 patients was included. The median age was 55 years (IQR, 40-64 years). Patients were treated for a variety of malignant and benign conditions. Acute myelogenous leukemia was most prevalent, at 44% of patients, followed by the myelodysplastic syndrome (14%) and non-Hodgkin lymphoma (12%). Fifty-six percent of patients had intermediate-risk disease per the rDRI at time of transplantation. The majority of patients received pretransplantation conditioning with a myeloablative regimen (74%), and 10% of patients were conditioned with a regimen including total body irradiation at any dose. HLA-matched sibling donors were used for 46% of patients treated. The median follow-up was 2.5 years (IQR, 1.7-3.9 years). Supplemental Table 2 compares our cohort to the derivation cohorts of each of the original scores.

Table 1.

Population characteristics

Characteristicn (%) or median [IQR]Missing (%)Included in*
Age, y 55 [40, 64] 0 (0) C-A, EBMT, PAM, rPAM, C-E 
Days from diagnosis to HSCT 189 [104, 596] 17 (3.2) EBMT, C-E 
Diagnosis  0 (0) EBMT, PAM, rPAM, C-E, rDRI 
 AML 233 (44.1)   
 ALL 52 (9.8)   
 CLL 11 (2.1)   
 CML 14 (2.7)   
 HL 19 (3.6)   
 MDS 78 (14.8)   
 MM 20 (3.8)   
 MF 20 (3.8)   
 NHL 65 (12.3)   
 AA/nonmalignant 16 (3.0)   
Serum ALT, U/L 32 [18, 53] 0 (0) HCT-CI, C-A, PAM, C-E 
Serum creatinine, mg/dL 0.85 [0.73, 1.03] 0 (0) HCT-CI, C-A, PAM, EASIx, C-E 
Serum LDH, U/L 208 [173, 276] 0 (0) EASIx 
Platelets, ×109/L 132 [64, 190] 0 (0) EASIx 
FEV1, % expected 95 [84, 104] 0 (0) HCT-CI, C-A, PAM, rPAM, C-E 
DLCo, adjusted for Hb 92.9 [80.4, 109.1] 0 (0) HCT-CI, C-A, C-E 
Disease risk  0 (0) rPAM, rDRI 
Low risk 29 (5.5)   
Intermediate risk 298 (56.4)   
High risk 144 (27.3)   
Very high risk 39 (7.4)   
Not applicable 18 (3.4)   
Regimen intensity  0 (0)  
 Myeloablative 391 (74.1)   
 Reduced intensity 137 (25.9)   
TBI-containing regimen 54 (10.2) 0 (0) PAM 
Donor  0 (0) EBMT, PAM, rPAM, C-E 
 Matched sibling 241 (45.6)   
 Matched unrelated, 10/10 207 (39.2)   
 Mismatched unrelated, 9/10 80 (15.2)   
Female to male 126 (23.9) 0 (0) EBMT 
CMV serostatus pair, %  38 (7.2) rPAM 
 Donor − Recipient − 52 (9.8)   
 Donor − Recipient + 96 (18.2)   
 Donor + Recipient − 30 (5.7)   
 Donor + Recipient + 312 (59.1)   
Characteristicn (%) or median [IQR]Missing (%)Included in*
Age, y 55 [40, 64] 0 (0) C-A, EBMT, PAM, rPAM, C-E 
Days from diagnosis to HSCT 189 [104, 596] 17 (3.2) EBMT, C-E 
Diagnosis  0 (0) EBMT, PAM, rPAM, C-E, rDRI 
 AML 233 (44.1)   
 ALL 52 (9.8)   
 CLL 11 (2.1)   
 CML 14 (2.7)   
 HL 19 (3.6)   
 MDS 78 (14.8)   
 MM 20 (3.8)   
 MF 20 (3.8)   
 NHL 65 (12.3)   
 AA/nonmalignant 16 (3.0)   
Serum ALT, U/L 32 [18, 53] 0 (0) HCT-CI, C-A, PAM, C-E 
Serum creatinine, mg/dL 0.85 [0.73, 1.03] 0 (0) HCT-CI, C-A, PAM, EASIx, C-E 
Serum LDH, U/L 208 [173, 276] 0 (0) EASIx 
Platelets, ×109/L 132 [64, 190] 0 (0) EASIx 
FEV1, % expected 95 [84, 104] 0 (0) HCT-CI, C-A, PAM, rPAM, C-E 
DLCo, adjusted for Hb 92.9 [80.4, 109.1] 0 (0) HCT-CI, C-A, C-E 
Disease risk  0 (0) rPAM, rDRI 
Low risk 29 (5.5)   
Intermediate risk 298 (56.4)   
High risk 144 (27.3)   
Very high risk 39 (7.4)   
Not applicable 18 (3.4)   
Regimen intensity  0 (0)  
 Myeloablative 391 (74.1)   
 Reduced intensity 137 (25.9)   
TBI-containing regimen 54 (10.2) 0 (0) PAM 
Donor  0 (0) EBMT, PAM, rPAM, C-E 
 Matched sibling 241 (45.6)   
 Matched unrelated, 10/10 207 (39.2)   
 Mismatched unrelated, 9/10 80 (15.2)   
Female to male 126 (23.9) 0 (0) EBMT 
CMV serostatus pair, %  38 (7.2) rPAM 
 Donor − Recipient − 52 (9.8)   
 Donor − Recipient + 96 (18.2)   
 Donor + Recipient − 30 (5.7)   
 Donor + Recipient + 312 (59.1)   

AA, aplastic anemia; ALL, acute lymphoid leukemia; ALT, alanine aminotransferase; AML, acute myeloid leukemia; C-A, comorbidity-age; C-E, comorbidity-EBMT; CLL, chronic lymphocytic leukemia; CML, chronic myelogenous leukemia; DLCo, diffusing capacity for carbon monoxide; FEV1, forced expiratory volume in 1 second; HL, Hodgkin lymphoma; LDH, lactate dehydrogenase; MDS, myelodysplastic syndrome; MF, myelofibrosis; MM, multiple myeloma; NHL, non-Hodgkin lymphoma; TBI, total body irradiation.

*

Additional comorbidity variables are included in the HCT-CI and C-A scores.

As defined by the rDRI.11  Alternative disease-staging schemes are included in the EBMT and PAM scores.

Score distributions

Each score was calculated for all patients in the cohort, with the exception of the rPAM, which could not be calculated for 38 patients (7%) due to missing donor CMV serostatus, and the rDRI, which was inapplicable to 19 patients (4%). The distribution for each score is shown in Figure 1 and supplemental Figure 1. Scores were positively correlated (supplemental Figure 2). The Pearson correlation between scores was generally below 0.50, except for scores whose components substantially overlap, such as Comorbidity-EBMT, Comorbidity-Age, and HCT-CI; the rPAM, which includes the rDRI, or the EBMT; and PAM, which shares a definition of disease risk. Scores were nonnormally distributed (P < .001 in all cases), with more patients having low (favorable) than high (adverse) values. EASIx was notable for its distant outliers, with 75% of values between 0 and 3.76, and the remaining quartile extending to 212 (Figure 1F inset).

Figure 1.

Distribution of the individual scores across the population. (A-F) Median (IQR) and number of patients per grouped stratum are provided. Scores tend to have a left (ie, favorable) bias. (F) Outliers in the EASIx score are shown in the inset. The Comorbidity-EBMT score is shown in the supplemental Appendix (supplemental Figure 1), and the distribution of the rDRI is provided in Table 1.

Figure 1.

Distribution of the individual scores across the population. (A-F) Median (IQR) and number of patients per grouped stratum are provided. Scores tend to have a left (ie, favorable) bias. (F) Outliers in the EASIx score are shown in the inset. The Comorbidity-EBMT score is shown in the supplemental Appendix (supplemental Figure 1), and the distribution of the rDRI is provided in Table 1.

Outcomes

The highest-risk stratum was associated with increased risk for overall mortality in the Comorbidity-Age, Comorbidity-EBMT, EBMT, PAM, rPAM, and EASIx scores and the rDRI in the univariable setting (Table 2; supplemental Table 3; Figure 2). However, a monotonic increase (ie, increasing risk with each score stratum) was best observed in rPAM (hazard ratio, 1.5, 2.5, 3.3), corresponding with decreasing OS probability. Similar results were observed for NRM, with hazard ratios ≥ 3.0 in the Comorbidity-Age, PAM, rPAM, and EASIx for the highest-risk strata. Relapse was not predicted by the HCT-CI or EASIx scores. A multivariable analysis, adjusted for age, donor type, conditioning intensity disease status, and year, was conducted only for HCT-CI and EASIx scores, as these indices do not include disease- and transplant-related features (supplemental Table 4); the highest-risk stratum of EASIx remained an independent predictor of overall and NRM. A similar multivariable analysis for the rDRI, incorporating age, donor type, conditioning intensity, and year of transplantation, demonstrated that the rDRI remained a predictor of overall and NRM as well as relapse. Additionally, higher rDRI levels were associated with increasing risk in the multivariable models for both HCT-CI and EASIx, further supporting these indices’ potentially additive role.

Table 2.

Two-year OS, NRM, and relapse incidence by score

ScoreLevelOSNRMRelapse incidence
2-y OS, % (range)Log-rank P2-y NRM, % (range)Gray P2-y RI, % (range)Gray P
HCT-CI 50.1 (40.7-61.8)  15.3 (9.5-24.6)  34.4 (26.1-45.3)  
1-2 60.2 (53.4-67.9)  18.6 (13.8-25.0)  24.8 (19.2-31.9)  
3+ 46.5 (40.2-53.7) .037 24.7 (19.6-31.1) .203 30.7 (25.2-37.3) .174 
Comorbidity-Age 69.6 (55.3-87.6)  11.1 (4.4-28.0)  22.4 (11.5-43.7)  
1-2 55.8 (48.8-63.7)  17.0 (12.3-23.3)  28.3 (22.6-35.5)  
3-4 55.7 (48.5-64.0)  18.9 (13.7-26.0)  27.6 (21.5-35.3)  
5+ 36.6 (28.6-46.8) <.001 31.3 (24.0-40.9) .006 34.8 (27.3-44.3) .389 
Comorbidity-EBMT 0/<4 49.9 (38.0-65.7)  14.6 (8.0-26.7)  34.3 (23.9-49.3)  
0/≥4 50.1 (36.1-69.7)  16.4 (7.7-34.9)  34.6 (22.6-52.9)  
I-II/<4 66.4 (57.4-76.8)  16.1 (10.1-25.5)  23.0 (15.7-33.6)  
I-II/≥4 53.9 (44.3-65.6)  21.1 (14.3-31.1)  26.5 (18.9-37.1)  
III+/<4 56.3 (47.1-67.3)  16.3 (10.4-25.7)  28.8 (21.3-38.9)  
III+/≥4 38.1 (30.2-48.0) .001 31.7 (24.4-41.1) .019 32.3 (25.0-41.7) .531 
EBMT 0-2 63.0 (54.7-72.6)  15.9 (10.5-24.1)  25.3 (18.5-34.5)  
54.4 (46.3-64.0)  16.0 (10.7-23.8)  30.1 (18.5-34.5)  
46.5 (38.0-56.9)  24.3 (17.5-33.6)  30.1 (22.9-39.6)  
46.5 (36.5-59.2)  22.0 (14.6-33.1)  34.6 (25.7-46.7)  
6-7 43.9 (32.6-59.2) .007 32.5 (22.4-47.1) .046 25.9 (16.7-40.1) .346 
rDRI Low 72.7 (57.1-92.6)  15.9 (6.4-39.5)  15.0 (6.0-37.4)  
Intermediate 61.4 (55.8-67.5)  16.2 (12.4-21.2)  23.6 (19.1-29.1)  
High 32.1 (24.9-41.3)  26.5 (20.1-35.0)  44.9 (37.4-54.0)  
Very high 31.5 (19.5-50.7) <.001 33.5 (21.5-52.3) .024 35.9 (23.6-54.6) <.001 
PAM <15 64.6 (56.8-73.5)  12.5 (8.0-19.5)  24.7 (18.3-33.3)  
15-20 59.7 (51.4-69.3)  16.6 (11.0-25.0)  24.4 (18.0-33.1)  
20-25 47.2 (38.8-57.6)  22.4 (16.0-31.2)  35.6 (28.0-45.3)  
>25 35.0 (27.5-44.7) <.001 32.4 (25.2-41.7) .001 33.2 (25.9-42.5) .036 
rPAM <12.3 71.2 (63.6-79.7)  12.1 (7.5-19.5)  17.7 (12.2-25.7)  
12.3-16.5 61.0 (52.2-71.2)  15.5 (10.0-24.0)  24.0 (17.2-33.6)  
16.6-21.9 42.5 (34.0-53.0)  23.9 (17.3-33.1)  36.0 (28.2-45.9)  
>21.9 35.7 (27.8-46.0) <.001 34.4 (26.8-44.2) <.001 33.7 (26.1-43.4) .003 
EASIx <0.89 66.1 (58.2-75.1)  11.1 (6.7-18.2)  26 (19.3-34.9)  
0.89-1.40 54.8 (46.4-64.7)  11.3 (6.9-18.6)  35.3 (27.8-44.8)  
1.40-3.76 49.0 (40.9-58.7)  27.8 (21.0-36.9  26.2 (19.7-34.9)  
>3.76 38.8 (30.9-48.6) <.001 32.0 (24.8-41.2) <.001 29.6 (22.6-38.6) .377 
ScoreLevelOSNRMRelapse incidence
2-y OS, % (range)Log-rank P2-y NRM, % (range)Gray P2-y RI, % (range)Gray P
HCT-CI 50.1 (40.7-61.8)  15.3 (9.5-24.6)  34.4 (26.1-45.3)  
1-2 60.2 (53.4-67.9)  18.6 (13.8-25.0)  24.8 (19.2-31.9)  
3+ 46.5 (40.2-53.7) .037 24.7 (19.6-31.1) .203 30.7 (25.2-37.3) .174 
Comorbidity-Age 69.6 (55.3-87.6)  11.1 (4.4-28.0)  22.4 (11.5-43.7)  
1-2 55.8 (48.8-63.7)  17.0 (12.3-23.3)  28.3 (22.6-35.5)  
3-4 55.7 (48.5-64.0)  18.9 (13.7-26.0)  27.6 (21.5-35.3)  
5+ 36.6 (28.6-46.8) <.001 31.3 (24.0-40.9) .006 34.8 (27.3-44.3) .389 
Comorbidity-EBMT 0/<4 49.9 (38.0-65.7)  14.6 (8.0-26.7)  34.3 (23.9-49.3)  
0/≥4 50.1 (36.1-69.7)  16.4 (7.7-34.9)  34.6 (22.6-52.9)  
I-II/<4 66.4 (57.4-76.8)  16.1 (10.1-25.5)  23.0 (15.7-33.6)  
I-II/≥4 53.9 (44.3-65.6)  21.1 (14.3-31.1)  26.5 (18.9-37.1)  
III+/<4 56.3 (47.1-67.3)  16.3 (10.4-25.7)  28.8 (21.3-38.9)  
III+/≥4 38.1 (30.2-48.0) .001 31.7 (24.4-41.1) .019 32.3 (25.0-41.7) .531 
EBMT 0-2 63.0 (54.7-72.6)  15.9 (10.5-24.1)  25.3 (18.5-34.5)  
54.4 (46.3-64.0)  16.0 (10.7-23.8)  30.1 (18.5-34.5)  
46.5 (38.0-56.9)  24.3 (17.5-33.6)  30.1 (22.9-39.6)  
46.5 (36.5-59.2)  22.0 (14.6-33.1)  34.6 (25.7-46.7)  
6-7 43.9 (32.6-59.2) .007 32.5 (22.4-47.1) .046 25.9 (16.7-40.1) .346 
rDRI Low 72.7 (57.1-92.6)  15.9 (6.4-39.5)  15.0 (6.0-37.4)  
Intermediate 61.4 (55.8-67.5)  16.2 (12.4-21.2)  23.6 (19.1-29.1)  
High 32.1 (24.9-41.3)  26.5 (20.1-35.0)  44.9 (37.4-54.0)  
Very high 31.5 (19.5-50.7) <.001 33.5 (21.5-52.3) .024 35.9 (23.6-54.6) <.001 
PAM <15 64.6 (56.8-73.5)  12.5 (8.0-19.5)  24.7 (18.3-33.3)  
15-20 59.7 (51.4-69.3)  16.6 (11.0-25.0)  24.4 (18.0-33.1)  
20-25 47.2 (38.8-57.6)  22.4 (16.0-31.2)  35.6 (28.0-45.3)  
>25 35.0 (27.5-44.7) <.001 32.4 (25.2-41.7) .001 33.2 (25.9-42.5) .036 
rPAM <12.3 71.2 (63.6-79.7)  12.1 (7.5-19.5)  17.7 (12.2-25.7)  
12.3-16.5 61.0 (52.2-71.2)  15.5 (10.0-24.0)  24.0 (17.2-33.6)  
16.6-21.9 42.5 (34.0-53.0)  23.9 (17.3-33.1)  36.0 (28.2-45.9)  
>21.9 35.7 (27.8-46.0) <.001 34.4 (26.8-44.2) <.001 33.7 (26.1-43.4) .003 
EASIx <0.89 66.1 (58.2-75.1)  11.1 (6.7-18.2)  26 (19.3-34.9)  
0.89-1.40 54.8 (46.4-64.7)  11.3 (6.9-18.6)  35.3 (27.8-44.8)  
1.40-3.76 49.0 (40.9-58.7)  27.8 (21.0-36.9  26.2 (19.7-34.9)  
>3.76 38.8 (30.9-48.6) <.001 32.0 (24.8-41.2) <.001 29.6 (22.6-38.6) .377 
Figure 2.

Kaplan-Meier plots depicting the outcome of OS in each of the studied scores. (A-H) Increasing strata generally reflect poorer outcome in each of the scores.

Figure 2.

Kaplan-Meier plots depicting the outcome of OS in each of the studied scores. (A-H) Increasing strata generally reflect poorer outcome in each of the scores.

Discrimination and calibration

Discrimination for OS, NRM, and relapse at 100 days, 1 year, 2 years, and 3 years posttransplantation is described in Figure 3, with further validation using 100 bootstraps presented in supplemental Table 5. For OS, AUCs ranged from 0.55 to 0.67. Values were highest for the PAM and rPAM scores across all time points, ranging from 0.62 to 0.66 for PAM and 0.63 to 0.67 for rPAM. The EASIx score showed comparable discrimination at day 100 (0.64), subsequently decreasing to as low as 0.58 at 3 years. The EBMT, HCT-CI, and Comorbidity-Age scores had AUCs ranging from 0.56 to 0.60 across all time points. Through the 2-year time point, PAM, rPAM, and EASIx had closely aligned AUCs for NRM (ranging from 0.63 to 0.67), though EASIx decreased at 3 years while PAM and rPAM remained stable. AUCs were lower overall for the prediction of relapse, with the highest AUC associated with the rPAM score and rDRI at 100 days and 1 year (ranging from 0.63 to 0.65); other AUCs for relapse were mostly in the 0.5 to 0.6 range. EASIx had the lowest AUCs for relapse at all time points. All scores were well calibrated for OS (supplemental Figure 3A-C).

Figure 3.

AUC curve for each score at 100 days, 1 year, 2 years, and 3 years posttransplantation. AUCs for prediction of OS (A), NRM (B), and relapse (C) are depicted.

Figure 3.

AUC curve for each score at 100 days, 1 year, 2 years, and 3 years posttransplantation. AUCs for prediction of OS (A), NRM (B), and relapse (C) are depicted.

Subpopulations

Score performance was further studied by age (<55 years, ≥55 years), donor type (MSD, MUD), and conditioning intensity (myeloablative conditioning [MAC], reduced-intensity conditioning [RIC]). PAM and rPAM had higher AUCs in the younger age group (0.65 vs 0.59, 0.69 vs 0.61, respectively; supplemental Figure 4). In contrast, EASIx performed better among older patients (0.61 vs 0.56). Most prognostic indices had similar discrimination irrespective of donor, with 2 exceptions. The PAM score demonstrated greater discrimination in the MSD setting (MSD, 0.68 vs MUD, 0.59) whereas the EASIx score had greater discrimination in the MUD setting (MSD, 0.56 vs MUD, 0.62). Higher AUCs were observed for the myeloablative subgroup for the rPAM, HCT-CI, Comorbidity-Age, and EASIX scores (MAC, 0.70 vs RIC, 0.61; 0.57 vs 0.51; 0.59 vs 0.53; and 0.63 vs 0.53, respectively).

Additionally, the acute leukemias, representing the most common indication for allogeneic transplantation, were studied separately. The rDRI, rPAM, EBMT, PAM, Comorbidity-EBMT, and EASIx scores all had AUCs in the low 0.6 range for 2-year OS. An additional score, the AL-EBMT, which is applicable only to the acute leukemias, was also included and had a similar AUC (0.63).

Discussion

In this retrospective analysis, we compared 8 prognostic models in a cohort of allogeneic transplant recipients. Score prediction performance, in terms of risk stratification and discrimination, varied considerably, both across outcomes and subgroups. The majority of models, most notably rPAM, successfully grouped patients into lower- and higher-risk strata, supporting their use for risk classification. However, accurate individualized prediction remains suboptimal. Similar to previous studies, the best score performances approached an AUC of 0.70 on a scale of 0.50 to 1.00, necessitating caution when making individual clinical decisions based on these tools. Score performance varied based on the outcome being measured, an effect observed most strikingly in the EASIx score, which was among the strongest predictors of NRM but had little or no information regarding relapse. Intriguingly, rPAM was roughly consistent in its prognostic capacity across all 3 outcomes studied.

Death following transplantation is typically understood as the tension between 2 competing events: transplantation-related mortality and relapse. Naturally, pretransplantation disease features tend to predict relapse whereas patient-specific characteristics are more indicative of transplantation-related mortality. Prognostic models in HSCT could be viewed as global scores, (eg, rPAM, PAM, EBMT, and Conditioning-EBMT), which incorporate variables from several domains to provide an estimate of the expected OS, vs domain-specific scores that include specific patient-related or disease-related features (eg, HCT-CI, Comorbidity-Age, EASIx, and rDRI). Depending on their components, the latter group may be informative of the risk of relapse or NRM. Indeed, rDRI was predictive of relapse, whereas EASIx was among the top predictors of NRM (Figure 3B). A clinician may use information from domain-based scores, but it is not clear how these should be combined with respect to balancing the benefit-risk ratio of transplantation. Therefore, a point could be made in favor of global scores integrating components predictive of relapse and NRM. The poor correlation between the EASIx or HCT-CI scores and all other scores (supplemental Figure 2) suggests that they may be additive. However, combination of the HCT-CI and EBMT (Comorbidity-EBMT) did not result in a meaningful improvement of prediction. Each of the global scores incorporates a variable for the risk inherent to the diagnosis and stage. The higher discrimination with rPAM may be attributed to the incorporation of the rDRI,11,18  which is a more contemporary disease-risk scheme than the EBMT score’s embedded disease-risk criteria. Overall, the disease-risk variable is perhaps the single greatest predictor of transplantation outcomes.15,18 

Determining the generalizability of prognostic scores and avoiding overoptimistic performance assessment requires external validation. Validation studies of each of these scores have been published,19-23  however, few direct comparisons on the same population have rarely been performed, and mostly include 2 or 3 scores (HCT-CI and EBMT or PAM).6,14,24-27  Furthermore, methodologies have varied; Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD) guidelines,17,28  which outline best practices for model development and validation, are inconsistently followed. In accordance with the TRIPOD recommendations, we report both the calibration and discrimination of the models. Overall, the models studied are well calibrated, indicating that predicted outcomes are aligned with observations. However, in the HCT-CI score, there is poor calibration for lower scores due to overestimation of survival for patients in the lowest score quartile. This is corrected in the Comorbidity-Age, implying that advanced age reduces overoptimism in the lower HCT-CI risk groups.

Score performances varied across subpopulations (supplemental Figure 4). In the rPAM, HCT-CI, and Comorbidity-Age scores, discriminative capacity was higher in the MAC setting compared with RIC. This may reflect a predominance of MAC among development cohorts (supplemental Table 2). EASIx also demonstrated higher AUCs in the myeloablative cohort; although EASIx was developed with a large number of RIC patients, it could be argued that MAC patients are more susceptible to the endothelial dysfunction that the score was initially developed to predict. Because acute leukemia is the leading indication for allogeneic transplantation, we performed a subanalysis restricted to this population (supplemental Figure 4D). The AL-EBMT score, an acute leukemia-specific score that was the first machine-learning–based predictive model developed in allogeneic transplantation, was also incorporated.15  The consistency of AUCs (range, 0.60-0.64) across all of the scores (except the comorbidity indices, which were lower), irrespective of different modeling approaches, suggests that databases comprising traditional parameters have been exhausted. Improvement in prediction will likely require the incorporation of novel biomarkers.

Models relying on robust and objective biomarkers may improve our ability to predict. We have previously shown that pretransplantation hypoalbuminemia and renal function abnormalities are among the strongest risk factors for poor outcomes in HSCT recipients.29  EASIx stands out for incorporating only laboratory-based markers, while performing similarly to other clinically oriented scores (rPAM, PAM) in the initial time points. Furthermore, when EASIx is studied in a multivariable analysis adjusting for key clinical features, the highest score strata maintained a strong association with increased mortality. Meeting its authors’ underlying hypothesis, the score’s accuracy is driven by the ability to predict NRM, a toxicity-based measure, whereas relapse is not predicted. The continuing identification of genetic and microbiome markers of both relapse and nonrelapse risk should motivate the field to pursue further biologically driven risk-prediction schemes.30-34 

Differences in the scores’ predictive performance between our cohort and the derivation cohorts may stem from differences in the populations (supplemental Table 2). The current cohort represents a more recent transplant era compared with the original studies, which may partially account for discrepancy in the models’ performance. Also, previous studies have suggested that the utility of risk indices may be center-specific.35  Despite being a single-center cohort, these results recapitulate findings by independent validation studies. Alternative donors are not represented in our cohort and remain underrepresented across all validation studies. The generalizability of transplantation prognostic indices to the alternative donor setting has been studied in small cohorts with varying performance.19,21  The development of donor-specific systems will likely contribute to more accurate predictions.36  One must keep in mind that scores are limited to patients who received transplants and do not capture alternative treatments; therefore, a truly informed decision contemplating all potential therapeutic paths is not considered. In clinical practice, the scores’ greatest utility may be in identifying that subset of patients who are least likely to benefit from HSCT. In all scores, the highest stratum was associated with substantially increased risk of poor outcome. Given the limited correlation between scores driven by meaningfully different feature sets, each system may identify a different subset of these highest-risk patients. This approach simulates, and may augment, the clinical intuition that integrates a patient’s physiologic status (age, comorbidities) and procedural characteristics (donor, diagnosis, and stage). The highest risk represents a population with extremely limited alternative treatment options, however, these patients may be ideal candidates for clinical trials.

As novel therapeutic approaches emerge in hemato-oncology, the risk-benefit analysis for allogeneic transplantation becomes ever more important. In this retrospective comparison of the leading prognostic indices in HSCT, we show that most models can be used to stratify patients, but not to make individualized predictions. Barriers to improvement include, first and foremost, quantity and quality of source data as well as selection biases in development cohorts. Oversimplifications of the relationship between predictor and response, such as the use of categorized in place of continuous measures, and parametric assumptions on data behavior, may lead to the loss of prognostic information.37  Also, aside from an inherent stochastic component, the risk of detrimental outcomes following transplantation evolves over time, and patients remain susceptible to events that cannot be anticipated (eg, infection, graft-versus-host disease, depression).16  The advent of electronic medical records and large international registries now permits a more granular exploration of transplantation outcomes. Personalization of transplantation procedure may be made possible by developing new and specific prediction schemes based on large, homogenous cohorts, while integrating novel modeling techniques.15,36,38  Furthermore, big data analysis may identify modifiable features, which are predictive of clinical paths and therefore could be acted upon. We have previously shown that the risk of conditioning toxicity is dependent on the patient’s individual comorbidities rather than their cumulative burden, suggesting the potential for both the prediction and treatment optimization that such a granular approach allows.39  A new generation of prediction models, integrating the newfound wealth of data and biological knowledge, is needed to truly inform individual decision-making in allogeneic transplantation.

The full-text version of this article contains a data supplement.

Acknowledgments

This work was supported by The Varda and Boaz Dotan Research Center in Hemato-Oncology affiliated with the Cancer Biology Research Center of Tel Aviv University and The Shalvi Foundation for the Support of Medical Research.

Authorship

Contribution: J.A.F., A. Shouval, A.N., and R.S. designed the study; J.A.F. and R.S. wrote the initial draft of the manuscript; and all authors were involved in the collection and interpretation of the data, edited the initial draft of the manuscript, and agreed to the final manuscript.

Conflict-of-interest disclosure: The authors declare no competing financial interests.

Correspondence: Joshua A. Fein, Sackler School of Medicine, Tel Aviv University, Ramat Aviv 69978, Israel; e-mail: joshuafein@gmail.com.

References

References
1.
Carreras
E
,
Dufour
C
,
Mohty
M
,
Kröger
N
, eds. The EBMT Handbook: Hematopoietic Stem Cell Transplantation and Cellular Therapies. 7th ed. Cham, Switzerland: Springer Open;
2019
.
2.
Potdar
R
,
Varadi
G
,
Fein
J
,
Labopin
M
,
Nagler
A
,
Shouval
R
.
Prognostic scoring systems in allogeneic hematopoietic stem cell transplantation: where do we stand?
Biol Blood Marrow Transplant
.
2017
;
23
(
11
):
1839
-
1846
.
3.
Sorror
ML
,
Maris
MB
,
Storb
R
, et al
.
Hematopoietic cell transplantation (HCT)-specific comorbidity index: a new tool for risk assessment before allogeneic HCT
.
Blood
.
2005
;
106
(
8
):
2912
-
2919
.
4.
Sorror
ML
,
Storb
RF
,
Sandmaier
BM
, et al
.
Comorbidity-age index: a clinical measure of biologic age before allogeneic hematopoietic cell transplantation
.
J Clin Oncol
.
2014
;
32
(
29
):
3249
-
3256
.
5.
Gratwohl
A
.
The EBMT risk score
.
Bone Marrow Transplant
.
2012
;
47
(
6
):
749
-
756
.
6.
Barba
P
,
Martino
R
,
Pérez-Simón
JA
, et al
.
Combination of the Hematopoietic Cell Transplantation Comorbidity Index and the European Group for Blood and Marrow Transplantation score allows a better stratification of high-risk patients undergoing reduced-toxicity allogeneic hematopoietic cell transplantation
.
Biol Blood Marrow Transplant
.
2014
;
20
(
1
):
66
-
72
.
7.
Parimon
T
,
Au
DH
,
Martin
PJ
,
Chien
JW
.
A risk score for mortality after allogeneic hematopoietic cell transplantation
.
Ann Intern Med
.
2006
;
144
(
6
):
407
-
414
.
8.
Au
BK
,
Gooley
TA
,
Armand
P
, et al
.
Reevaluation of the pretransplant assessment of mortality score after allogeneic hematopoietic transplantation
.
Biol Blood Marrow Transplant
.
2015
;
21
(
5
):
848
-
854
.
9.
Luft
T
,
Benner
A
,
Jodele
S
, et al
.
EASIX in patients with acute graft-versus-host disease: a retrospective cohort analysis
.
Lancet Haematol
.
2017
;
4
(
9
):
e414
-
e423
.
10.
Luft
T
,
Benner
A
,
Jodele
S
, et al
.
It is Easix to predict non-relapse mortality (NRM) of allogeneic stem cell transplantation (alloSCT)
[abstract].
Blood
.
2016
;
128
(
22
). Abstract 519.
11.
Armand
P
,
Kim
HT
,
Logan
BR
, et al
.
Validation and refinement of the Disease Risk Index for allogeneic stem cell transplantation
.
Blood
.
2014
;
123
(
23
):
3664
-
3671
.
12.
Bacigalupo
A
,
Ballen
K
,
Rizzo
D
, et al
.
Defining the intensity of conditioning regimens: working definitions
.
Biol Blood Marrow Transplant
.
2009
;
15
(
12
):
1628
-
1633
.
13.
Shimoni
A
,
Hardan
I
,
Shem-Tov
N
,
Rand
A
,
Yerushalmi
R
,
Nagler
A
.
Fludarabine and treosulfan: a novel modified myeloablative regimen for allogeneic hematopoietic stem-cell transplantation with effective antileukemia activity in patients with acute myeloid leukemia and myelodysplastic syndromes
.
Leuk Lymphoma
.
2007
;
48
(
12
):
2352
-
2359
.
14.
Barba
P
,
Piñana
JL
,
Martino
R
, et al
.
Comparison of two pretransplant predictive models and a flexible HCT-CI using different cut off points to determine low-, intermediate-, and high-risk groups: the flexible HCT-CI is the best predictor of NRM and OS in a population of patients undergoing allo-RIC
.
Biol Blood Marrow Transplant
.
2010
;
16
(
3
):
413
-
420
.
15.
Shouval
R
,
Labopin
M
,
Bondi
O
, et al
.
Prediction of allogeneic hematopoietic stem-cell transplantation mortality 100 days after transplantation using a machine learning algorithm: a European Group for Blood and Marrow Transplantation Acute Leukemia Working Party retrospective data mining study
.
J Clin Oncol
.
2015
;
33
(
28
):
3144
-
3151
.
16.
Estey
E
,
Gale
RP
.
How good are we at predicting the fate of someone with acute myeloid leukaemia?
Leukemia
.
2017
;
31
(
6
):
1255
-
1258
.
17.
Moons
KG
,
Altman
DG
,
Reitsma
JB
, et al
.
Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): explanation and elaboration
.
Ann Intern Med
.
2015
;
162
(
1
):
W1
-
W73
.
18.
Armand
P
,
Gibson
CJ
,
Cutler
C
, et al
.
A disease risk index for patients undergoing allogeneic stem cell transplantation
.
Blood
.
2012
;
120
(
4
):
905
-
913
.
19.
Zhu
X
,
Huang
L
,
Zheng
C
, et al
.
European Group for Blood and Marrow Transplantation risk score predicts the outcome of patients with acute leukemia receiving single umbilical cord blood transplantation
.
Biol Blood Marrow Transplant
.
2017
;
23
(
12
):
2118
-
2126
.
20.
Middeke
JM
,
Kollinger
F
,
Baldauf
H
, et al
.
Validation of the revised pretransplant assessment of mortality score in patients with acute myelogenous leukemia undergoing allogeneic hematopoietic stem cell transplantation
.
Biol Blood Marrow Transplant
.
2018
;
24
(
9
):
1947
-
1951
.
21.
Elsawy
M
,
Storer
BE
,
Milano
F
, et al
.
Prognostic performance of the Augmented Hematopoietic Cell Transplantation-Specific Comorbidity/Age Index in recipients of allogeneic hematopoietic stem cell transplantation from alternative graft sources
.
Biol Blood Marrow Transplant
.
2019
;
25
(
5
):
1045
-
1052
.
22.
Sanchez-Escamilla
M
,
Hilden
P
,
Maloy
M
, et al
.
The prognostic calculator Easix predicts acute Gvhd, non-relapse mortality and overall survival in adult patients undergoing reduced intensity conditioning allogeneic HCT [abstract]
.
Blood
.
2018
;
132
(
suppl 1
). Abstract 2069.
23.
Sorror
ML
,
Logan
BR
,
Zhu
X
, et al
.
Prospective validation of the predictive power of the Hematopoietic Cell Transplantation Comorbidity Index: a Center for International Blood and Marrow Transplant Research study
.
Biol Blood Marrow Transplant
.
2015
;
21
(
8
):
1479
-
1487
.
24.
Yamamoto
W
,
Ogusa
E
,
Matsumoto
K
,
Maruta
A
,
Ishigatsubo
Y
,
Kanamori
H
.
Predictive value of risk assessment scores in patients with hematologic malignancies undergoing reduced-intensity conditioning allogeneic stem cell transplantation
.
Am J Hematol
.
2014
;
89
(
9
):
E138
-
E141
.
25.
Versluis
J
,
Labopin
M
,
Niederwieser
D
, et al
.
Prediction of non-relapse mortality in recipients of reduced intensity conditioning allogeneic stem cell transplantation with AML in first complete remission
.
Leukemia
.
2015
;
29
(
1
):
51
-
57
.
26.
Castagna
L
,
Fürst
S
,
Marchetti
N
, et al
.
Retrospective analysis of common scoring systems and outcome in patients older than 60 years treated with reduced-intensity conditioning regimen and alloSCT
.
Bone Marrow Transplant
.
2011
;
46
(
7
):
1000
-
1005
.
27.
Xhaard
A
,
Porcher
R
,
Chien
JW
, et al
.
Impact of comorbidity indexes on non-relapse mortality
.
Leukemia
.
2008
;
22
(
11
):
2062
-
2069
.
28.
Collins
GS
,
Reitsma
JB
,
Altman
DG
,
Moons
KG
.
Transparent Reporting of a multivariable prediction model for Individual Prognosis or Diagnosis (TRIPOD): the TRIPOD statement
.
Ann Intern Med
.
2015
;
162
(
1
):
55
-
63
.
29.
Shouval
R
,
de Jong
CN
,
Fein
J
, et al
.
Baseline renal function and albumin are powerful predictors for allogeneic transplantation-related mortality
.
Biol Blood Marrow Transplant
.
2018
;
24
(
8
):
1685
-
1691
.
30.
Hartwell
MJ
,
Özbek
U
,
Holler
E
, et al
.
An early-biomarker algorithm predicts lethal graft-versus-host disease and survival [published correction appears in JCI Insight. 2018;3(16)]
.
JCI Insight
.
2017
;
2
(
3
):
e89798
.
31.
Lindsley
RC
,
Saber
W
,
Mar
BG
, et al
.
Prognostic mutations in myelodysplastic syndrome after stem-cell transplantation
.
N Engl J Med
.
2017
;
376
(
6
):
536
-
547
.
32.
Grinfeld
J
,
Nangalia
J
,
Baxter
EJ
, et al
.
Classification and personalized prognosis in myeloproliferative neoplasms
.
N Engl J Med
.
2018
;
379
(
15
):
1416
-
1430
.
33.
Gerstung
M
,
Papaemmanuil
E
,
Martincorena
I
, et al
.
Precision oncology for acute myeloid leukemia using a knowledge bank approach
.
Nat Genet
.
2017
;
49
(
3
):
332
-
340
.
34.
Shono
Y
,
van den Brink
MRM
.
Gut microbiota injury in allogeneic haematopoietic stem cell transplantation
.
Nat Rev Cancer
.
2018
;
18
(
5
):
283
-
295
.
35.
Törlén
J
,
Remberger
M
,
Le Blanc
K
,
Ljungman
P
,
Mattsson
J
.
Impact of pretransplantation indices in hematopoietic stem cell transplantation: knowledge of center-specific outcome data is pivotal before making index-based decisions
.
Biol Blood Marrow Transplant
.
2017
;
23
(
4
):
677
-
683
.
36.
Shouval
R
,
Ruggeri
A
,
Labopin
M
, et al
.
An integrative scoring system for survival prediction following umbilical cord blood transplantation in acute leukemia
.
Clin Cancer Res
.
2017
;
23
(
21
):
6478
-
6486
.
37.
Shouval
R
,
Bondi
O
,
Mishan
H
,
Shimoni
A
,
Unger
R
,
Nagler
A
.
Application of machine learning algorithms for clinical predictive modeling: a data-mining approach in SCT
.
Bone Marrow Transplant
.
2014
;
49
(
3
):
332
-
337
.
38.
Topol
EJ
.
High-performance medicine: the convergence of human and artificial intelligence
.
Nat Med
.
2019
;
25
(
1
):
44
-
56
.
39.
Fein
JA
,
Shimoni
A
,
Labopin
M
, et al
.
The impact of individual comorbidities on non-relapse mortality following allogeneic hematopoietic stem cell transplantation
.
Leukemia
.
2018
;
32
(
8
):
1787
-
1794
.

Author notes

*

R.S., J.A.F., and A. Shouval contributed equally to this study.

Individual patient data will not be shared per institutional review board requirements.

Supplemental data