• We developed a machine-learning algorithm to guide differential diagnosis of BMF.

  • Acquired vs inherited prediction relied on 25 variables recorded through a comprehensive physical and laboratory evaluation at the time of first evaluation.

The choice to postpone treatment while awaiting genetic testing can result in significant delay in definitive therapies in patients with severe pancytopenia. Conversely, the misdiagnosis of inherited bone marrow failure (BMF) can expose patients to ineffectual and expensive therapies, toxic transplant conditioning regimens, and inappropriate use of an affected family member as a stem cell donor. To predict the likelihood of patients having acquired or inherited BMF, we developed a 2-step data-driven machine-learning model using 25 clinical and laboratory variables typically recorded at the initial clinical encounter. For model development, patients were labeled as having acquired or inherited BMF depending on their genomic data. Data sets were unbiasedly clustered, and an ensemble model was trained with cases from the largest cluster of a training cohort (n = 359) and validated with an independent cohort (n = 127). Cluster A, the largest group, was mostly immune or inherited aplastic anemia, whereas cluster B comprised underrepresented BMF phenotypes and was not included in the next step of data modeling because of a small sample size. The ensemble cluster A–specific model was accurate (89%) to predict BMF etiology, correctly predicting inherited and likely immune BMF in 79% and 92% of cases, respectively. Our model represents a practical guide for BMF diagnosis and highlights the importance of clinical and laboratory variables in the initial evaluation, particularly telomere length. Our tool can be potentially used by general hematologists and health care providers not specialized in BMF, and in under-resourced centers, to prioritize patients for genetic testing or for expeditious treatment.

1.
Bluteau
O
,
Sebert
M
,
Leblanc
T
, et al
.
A landscape of germ line mutations in a cohort of inherited bone marrow failure patients
.
Blood
.
2018
;
131
(
7
):
717
-
732
.
2.
Young
NS
.
Aplastic anemia
.
N Engl J Med
.
2018
;
379
(
17
):
1643
-
1656
.
3.
Wegman-Ostrosky
T
,
Savage
SA
.
The genomics of inherited bone marrow failure: from mechanism to the clinic
.
Br J Haematol
.
2017
;
177
(
4
):
526
-
542
.
4.
Townsley
DM
,
Dumitriu
B
,
Young
NS
.
Bone marrow failure and the telomeropathies
.
Blood
.
2014
;
124
(
18
):
2775
-
2783
.
5.
Calado
RT
,
Young
NS
.
Telomere diseases
.
N Engl J Med
.
2009
;
361
(
24
):
2353
-
2365
.
6.
Townsley
DM
,
Scheinberg
P
,
Winkler
T
, et al
.
Eltrombopag added to standard immunosuppression for aplastic anemia
.
N Engl J Med
.
2017
;
376
(
16
):
1540
-
1550
.
7.
Ghemlas
I
,
Li
H
,
Zlateska
B
, et al
.
Improving diagnostic precision, care and syndrome definitions using comprehensive next-generation sequencing for the inherited bone marrow failure syndromes
.
J Med Genet
.
2015
;
52
(
9
):
575
-
584
.
8.
Grinfeld
J
,
Nangalia
J
,
Baxter
EJ
, et al
.
Classification and personalized prognosis in myeloproliferative neoplasms
.
N Engl J Med
.
2018
;
379
(
15
):
1416
-
1430
.
9.
Munger
E
,
Choi
H
,
Dey
AK
, et al
.
Application of machine learning to determine top predictors of noncalcified coronary burden in psoriasis: an observational cohort study
.
J Am Acad Dermatol
.
2020
;
83
(
6
):
1647
-
1653
.
10.
Rajkomar
A
,
Oren
E
,
Chen
K
, et al
.
Scalable and accurate deep learning with electronic health records
.
NPJ Digit Med
.
2018
;
1
:
18
.
11.
Libbrecht
MW
,
Noble
WS
.
Machine learning applications in genetics and genomics
.
Nat Rev Genet
.
2015
;
16
(
6
):
321
-
332
.
12.
Rajkomar
A
,
Dean
J
,
Kohane
I
.
Machine learning in medicine
.
N Engl J Med
.
2019
;
380
(
14
):
1347
-
1358
.
13.
Shouval
R
,
Fein
JA
,
Savani
B
,
Mohty
M
,
Nagler
A
.
Machine learning and artificial intelligence in haematology
.
Br J Haematol
.
2021
;
192
(
2
):
239
-
250
.
14.
Gunčar
G
,
Kukar
M
,
Notar
M
,
Brvar
M
,
Černelč
P
.
An application of machine learning to haematological diagnosis
.
Sci Rep
.
2018
;
8
(
1
):
411
.
15.
Radakovich
N
,
Nagy
M
,
Nazha
A
.
Machine learning in haematological malignancies
.
Lancet Haematol
.
2020
;
7
(
7
):
e541
-
e550
.
16.
Sachdev
V
,
Tian
X
,
Gu
Y
, et al
.
A phenotypic risk score for predicting mortality in sickle cell disease
.
Br J Haematol
.
2021
;
192
(
5
):
932
-
941
.
17.
Abelson
S
,
Collord
G
,
Ng
SWK
, et al
.
Prediction of acute myeloid leukaemia risk in healthy individuals
.
Nature
.
2018
;
559
(
7714
):
400
-
404
.
18.
Nagata
Y
,
Zhao
R
,
Awada
H
, et al
.
Machine learning demonstrates that somatic mutations imprint invariant morphologic features in myelodysplastic syndromes
.
Blood
.
2020
;
136
(
20
):
2249
-
2262
.
19.
Nykamp
K
,
Anderson
M
,
Powers
M
, et al
.
Sherloc: a comprehensive refinement of the ACMG-AMP variant classification criteria
.
Genet Med
.
2017
;
19
(
10
):
1105
-
1117
.
20.
Richards
S
,
Aziz
N
,
Bale
S
, et al
.
Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology
.
Genet Med
.
2015
;
17
(
5
):
405
-
424
.
21.
Gutierrez-Rodrigues
F
,
Santana-Lemos
BA
,
Scheucher
PS
,
Alves-Paiva
RM
,
Calado
RT
.
Direct comparison of flow-FISH and qPCR as diagnostic tests for telomere length measurement in humans
.
PLoS One
.
2014
;
9
(
11
):
e113747
.
22.
Calinski
T
,
Harabasz
J
.
A dendrite method for cluster analysis
.
Commun Stat
.
1974
:
1
-
27
.
23.
Robnik-Šikonja
M
,
Kononenko
I
.
Theoretical and Empirical Analysis of ReliefF and RReliefF
.
Machine Learning
.
2003
;
53
:
23
-
69
.
24.
Hastie
T
,
Tibshirani
R
,
Friedman J
. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Second ed.
Springer
;
2009
.
25.
DeZern
AE
,
Symons
HJ
,
Resar
LS
,
Borowitz
MJ
,
Armanios
MY
,
Brodsky
RA
.
Detection of paroxysmal nocturnal hemoglobinuria clones to exclude inherited bone marrow failure syndromes
.
Eur J Haematol
.
2014
;
92
(
6
):
467
-
470
.
26.
Shah
YB
,
Priore
SF
,
Li
Y
, et al
.
The predictive value of PNH clones, 6p CN-LOH, and clonal TCR gene rearrangement for aplastic anemia diagnosis
.
Blood Adv
.
2021
;
5
(
16
):
3216
-
3226
.
27.
Vulliamy
TJ
,
Kirwan
MJ
,
Beswick
R
, et al
.
Differences in disease severity but similar telomere lengths in genetic subgroups of patients with telomerase and shelterin mutations
.
PLoS One
.
2011
;
6
(
9
):
e24383
.
28.
van der Ploeg
T
,
Austin
PC
,
Steyerberg
EW
.
Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints
.
BMC Med Res Methodol
.
2014
;
14
:
137
.
29.
Zhang
MY
,
Keel
SB
,
Walsh
T
, et al
.
Genomic analysis of bone marrow failure and myelodysplastic syndromes reveals phenotypic and diagnostic complexity
.
Haematologica
.
2015
;
100
(
1
):
42
-
48
.
You do not currently have access to this content.
Sign in via your Institution