A machine learning (ML) approach could improve the accuracy of a full blood count for predicting iron deficiency, according to findings presented at the European Hematology Association (EHA) 2024 Congress.
The first step in testing for iron deficiency is a full, or complete, blood count, which provides the red cell count, hemoglobin, and additional red cell indices such as mean corpuscular volume (MCV), mean corpuscular hemoglobin (MCH), and red cell distribution width (RDW). Then, further testing is done if either the MCV or MCH are below certain thresholds. Full blood counts are measured with an automatic hematology analyzer machine, with about 20 of 500 summary numbers picked out and given to clinicians. The rest of this data set is “usually just deleted and discarded,” said Daniel Kreuter, a PhD student in applied mathematics at Cambridge University, who presented the findings.
“But it turns out that these high-dimensional data are exactly ideal for a pattern recognition algorithm to be used and make this test — that’s being done all the time anyway — more valuable to clinicians,” he said.
A randomized controlled trial of 45,000 blood donors, known as INTERVAL, meant to evaluate the effects of frequent blood donation on the donors, provided data that showed the poor performance of hemoglobin, MCV, and MCH on predicting iron deficiency. Researchers found that even when these parameters were combined, the sensitivity was less than 50%.
“Before we do further iron testing, we’re already missing over half of the iron-deficient cases,” Mr. Kreuter said.
To assess the ML model, researchers used 80% of the INTERVAL data to train the model and 20% to test the model after it had been trained.
Using an iron deficiency definition of 15 mcg/L, conventional measures identified iron deficiency 18.8% of the time. Using a definition of a fall in hemoglobin of 10 g/L over two years, conventional measures identified it 44.5% of the time.
The ML model identified iron deficiency about 75% of the time under both definitions, Mr. Kreuter said. Since then, his lab has seen rates in the high 70s, he said.
The ML model has an area under the curve for identifying iron deficiency of 0.82. When using two blood measures taken at two time points, it jumps to 0.95, Mr. Kreuter said.
These longitudinal data are “quite common to have in donors as well as in patients,” he said.
He said the full blood count data “come out of the machine the same way” — they just aren’t discarded.
“We can save those and improve sensitivity immensely,” he said.
He said researchers are now validating the ML with a second cohort with an ethnically diverse donor population and looking into enhancing the sensitivity of models using neural networks instead of the ML approach presented in these findings.
In response to a question about how the approach would apply to an elderly population — rather than the otherwise healthy donors involved in this study — he said the approach would probably have to be tailored for that group.
“There might be other things you have to filter for,” he said. “There’s always, for every clinical project, different patient filtering.”
Any conflicts of interest declared by the authors can be found in the original abstract.
Reference
Kreuter D, Deltadahl S, Gilbey J, et al. Machine learning to transform iron deficiency screening: from rusty tools to cutting-edge solutions. Abstract S332. Presented at the European Hematology Association (EHA) 2024 Congress; June 14, 2024; Madrid, Spain.