Background: Myelodysplastic syndromes (MDS) are diagnosed with a bone marrow examination (BME), an invasive procedure that patients (pts) would rather avoid. Earlier (Oster et al, Leuk Lymph 2018) we developed a logistic regression (LoR) model to diagnose MDS by incorporating 6 variables (age, gender, Hb, WBC, PLT, MCV) into a formula (Figure 1A). We improved the model using data from 178 MDS pts (47 from Tel Aviv, 131 from the EUMDS registry), and 178 controls (ASH 2017). Here we significantly improve the model using a much larger dataset (501 pts; 501 controls) and additional variables.

Methods: The EUMDS registry contains data on 2600 BME-proven MDS pts. A random sample of 501 MDS pts from the registry was combined with 501 controls with no MDS (ruled out with BME). Gradient-boosted models (GBM) were used to predict having or not having MDS, using the variables age, gender, Hb, WBC, PLT, MCV, neutrophils, monocytes, glucose, and creatinine (Figure 1B). Area under the ROC curve (AUC), sensitivity and specificity were used to evaluate the models, and model performance was validated by using 100 times 5-fold cross-validation. Model stability was also assessed by repeating the fit of the models using different randomly chosen groups of 501 EUMDS pts as cases.

Results: The AUC was 0.97 (95% CI 0.96-0.98, Figure 2), compared with an AUC of 0.87 (0.84-0.91) achieved previously. Under cross-validation, AUC was 89%. Maximizing the sum of sensitivity and specificity led to sensitivity of 88% and specificity of 95%. This means we can calculate a threshold "GBM score," assigning a subject to "MDS" or "no MDS" status with a specificity of 95% and a sensitivity of 88%. Alternatively, we can set two GBM score thresholds G1 & G2, where a GBM score > G2 provides 95% specificity and a score < G1 provides 95% sensitivity. A score between these two cutoffs gives an indeterminate probability of disease. Only 24% of our MDS patients and 15% of our control patients fall into this indeterminate region, compared with about 50% in our earlier model. The most influential variables were MCV, creatinine and neutrophils. Repeated random choice of cases from the EUMDS registry led to stable results.

Conclusions: Using easily accessible parameters, MDS can be diagnosed or excluded non-invasively with high accuracy in a substantially large portion of patients. While the Logistic Regression model (Figure 1A) can be used with a relatively simple formula, the Gradient Boosted Model (Figure 1B) is more complex. The GBM combines the variables and the interactions among them, achieving an AUC that represents an excellent predictive ability and a considerable improvement over the previous model. Adding peripheral blood cytogenetic/genetic information could further improve non-invasive MDS diagnosis, and obviate the need for bone marrow examination in many patients. We continue to improve and validate the model. An on-line calculator/app for use in a clinical setting is being developed and will be presented.


Smith:Jazz Pharmaceuticals: Research Funding; Johnson & Johnson: Research Funding; Novartis: Research Funding; Gilead Sciences: Consultancy. Fenaux:Celgene: Honoraria, Research Funding; Otsuka: Honoraria, Research Funding; Jazz: Honoraria, Research Funding; Janssen: Honoraria, Research Funding. Stauder:Novartis: Honoraria, Membership on an entity's Board of Directors or advisory committees; Teva: Research Funding; Celgene: Honoraria, Membership on an entity's Board of Directors or advisory committees. Germing:Novartis: Honoraria, Research Funding; Celgene: Honoraria, Research Funding; Janssen: Honoraria.

Author notes


Asterisk with author names denotes non-ASH members.