Machine learning (ML) is a branch of computer science in which an algorithm generates predictive models by learning from training data without explicitly being programmed.2,3 Significant advances in ML algorithms have been achieved in recent years, especially in the field of computer vision (using ML algorithms to detect objects and patterns in images).2,3 These advances allowed Google search to accurately label complex images, enable smartphones to recognize faces, and paved the way for the development of self-driving cars. Computer vision algorithms have also been used in healthcare to label radiological and pathological images.4,5
Computer vision uses deep neural networks that represent mathematical modeling of the neuron in the human brain.2 These networks have 3 components: an input layer that takes source material from either text, structured data, or image; an output layer that provides the desired output; and multiple hidden layers that connect the input and output layers. The most common subtype of neural network used in computer vision is convolutional neural network (CNN). CNN can be used to classify images (identify if the image has a dog or a cat), segment images (draw a line around the desired prediction), or detect objects in the image by drawing a box around them. These methods can be used separately or combined in 1 intelligent system.
In this study, the authors trained a CNN model on 171 374 bone marrow cytology images from 961 patients diagnosed with variety of hematological diseases and evaluated at the Munich Leukemia Laboratory between 2011 and 2013. To prepare the slides to be used by the ML model, the authors first digitally scanned the whole slide images (×40 oil immersion with original dimensions of 2452 × 2056 pixels) for the morphological cell analysis. From the examined regions, diagnostically relevant cell images were annotated into 21 different classes (these classes include myeloid and lymphoid malignancies and nonmalignant conditions). The images were then annotated by experienced hematopathologists and sized at 250 × 250 pixels. It is common to downsize the images before training them with a CNN model given that larger images can be difficult to train and require a lot of computational power. The authors acknowledged the imbalance between the classes given that some of these classes are less prevalent than others. To overcome that, the authors oversampled the imbalanced classes using image augmentation (a common technique to improve the ML model performance). The authors then used a CNN model (ResNext-50) to train the algorithm for recognizing each class. The ML model showed accurate performance for the classes with a higher number in the training samples, whereas the performance was lower on the rarer classes. This observation is common when training ML models on imbalanced datasets. Whereas humans can be trained on a small number of images to identify objects, ML models require a large amount of data to be able to produce accurate results. The authors acknowledged the difficulties in identifying individual morphologies, especially when they are closely related in the leukocyte-differentiating lineage. The authors then validated the model on an independent data set of 627 single-cell images from 30 slides of 10 patients. Approximately 39% of these images in the validation cohort were classified as an artifact or not identifiable categories, suggesting that the classes of these images cannot be predicted by the model. This discrepancy between the model performance in the original dataset and the validation cohort could be related to different techniques in staining and annotating the images, but also raise an important point when evaluating the performance of an ML model in healthcare. The reproducibility of the model in multiple different clinical settings is an important part of the successful implementation/adaptation of this model in the clinical workflow and should be required before implementing these algorithms in any hospital or laboratory.
The authors also completed another important step in building an ML model by performing an explainability analysis (trying to identify the structure that the algorithm is looking at when making a prediction). Although the model was able to identify the structure that human looks at in common images, it was less robust in rarer images. Several studies have shown that CNNs in some cases are using irrelevant structures in radiological or pathological images to make a prediction of an outcome.6,7 Explainability of a ML algorithm represent another important step of the successful adaptations of these models in clinical practice.
This study represents a significant step toward the integration of ML models in clinical practice. The authors professionally annotated a very large dataset and used a state-of-the-art computer vision algorithm to train and validate the model. The dataset can be used for future research to improve the model performance and to integrate it in the workflow of hematopathologists.
In summary, artificial intelligence and ML algorithms are changing our lives. These technologies will have significant impact on healthcare in the next decade. Although these algorithms may not replace physicians and researchers, it will definitely aid them in providing better care/research that can improve patient lives. As Oren Harari once said: “The electric light did not come from the continuous improvement of candles.” If we really want to have a significant impact on healthcare in the future, we need to start embracing the impact of these technologies and learn how to use and integrate them into our daily workflow.
Conflict-of-interest disclosure: Speaker bureaus for Incyte Corporation and Novartis; data monitoring committee for MEI Pharma; advisory board/consulting (pharmaceutical/biotechnology) for AbbVie; and stock in Amazon.