The role of histopathology in the diagnosis of essential thrombocythemia (ET) is controversial, and there has been little attempt to quantitate interobserver variability. Diagnostic bone marrow trephine biopsy specimens from 370 patients with ET by Polycythemia Vera Study Group (PVSG) criteria were assessed by 3 experienced hematopathologists for 16 different morphologic features and overall diagnosis according to the World Health Organization (WHO) classification. Our results show substantial interobserver variability, particularly for overall diagnosis and individual cellular characteristics such as megakaryocyte morphology. Reticulin grade was the dominant independent predictor of WHO diagnostic category for all 3 hematopathologists. Factor analysis identified 3 independent factors likely to reflect underlying biologic processes. One factor related to overall and lineage-specific cellularity and was significantly associated with JAK2 V617F status (P < .001), a second factor related to megakaryocyte clustering, and a third was associated with the fibrotic process. No differences could be discerned between patients labeled as having “prefibrotic myelofibrosis” or “true ET” in clinical and laboratory features at presentation, JAK2 status, survival, thrombosis, major hemorrhage, or myelofibrotic transformation. These results show that histologic criteria described in the WHO classification are difficult to apply reproducibly and question the validity of distinguishing true ET from prefibrotic myelofibrosis on the basis of subjective morphologic criteria. This study was registered at http://isrctn.org as #72251782 and at http://eudract.emea.europa.eu/ as #2004-000245-38.
Introduction
The myeloproliferative disorders (MPDs) are clonal hematologic malignancies comprising 3 main disorders: essential thrombocythemia (ET), polycythemia vera (PV), and primary myelofibrosis (MF).1,–3 In 2005 the JAK2 V617F mutation4,,–7 was shown in 95% of patients with PV and in just more than half of those with ET and MF.4,8,–10 Before this, the diagnosis of these disorders relied on a combination of clinical, laboratory, and histologic features using one of several different sets of diagnostic criteria. None of these was universally accepted, although the Polycythemia Vera Study Group (PVSG) criteria11,12 and modifications13 were adopted for major clinical trials such as ECLAP in PV14 and PT-1 in ET15 both established in the late 1990s. In 2001, the World Health Organization (WHO) published its criteria for the diagnosis of MPDs.16 This classification scheme is pathology based and, compared with the PVSG criteria, introduced a heavy emphasis on bone marrow trephine morphology together with the concept of “prefibrotic myelofibrosis” and “true ET” as distinct disorders. In the past 2 years, testing for the JAK2 V617F mutation has been incorporated into new diagnostic criteria, including a revised version of the WHO criteria.1,17,18
A body of published literature, predominantly originating from the Cologne Group,19,,,,–24 underpins the histologic features described in the WHO classification of patients with thrombocythemia. In recent years, this group has produced a series of publications representing multiple retrospective analyses of an expanding and well-characterized archive of trephine biopsy specimens from patients with chronic myeloproliferative diseases. In particular, the investigators claim that approximately 40% to 50% of patients with ET in fact have prefibrotic myelofibrosis and that this entity needs to be distinguished from true ET.19,,,,–24 This claim is based on the following 3 main assertions: (1) The morphologic features of bone marrow trephines in patients with thrombocythemia can be reliably subdivided into 2 distinct patterns.20,21 (2) There is minimal development of marrow fibrosis over time in patients diagnosed with true ET,20,–22,24 contrasting with development of at least mild and sometimes severe fibrosis on long-term follow-up in patients with prefibrotic myelofibrosis.22,24 (3) There is an apparent reduction in life expectancy in prefibrotic myelofibrosis compared with true ET.20,22
However, the interpretation of published data supporting these claims is complicated by the retrospective nature of the patient cohort analyses,19,,,,–24 the apparently overlapping nature of the cohorts studied in different papers that draw similar conclusions,20,–22,24 a failure to correct for known prognostic factors in survival analyses,20,22,24 and lack of details of the causes of death or definition of myelofibrotic transformation.20,22,24 In addition, there has been no characterization of the interobserver reliability of morphologic features used to distinguish the proposed entities of prefibrotic myelofibrosis and true ET, and it has only recently become possible to correlate histologic findings with underlying molecular lesions. We have therefore addressed the role of bone marrow histology in a study of patients enrolled in 3 prospective studies of ET, including the PT-1 trial.15
Methods
Study population
Patients with a new diagnosis and previously treated, aged 18 years or older, who were judged by local clinicians to meet the Polycythemia Vera Study Group (PVSG) criteria for essential thrombocythemia,12 were recruited into 1 of 3 multicenter studies: the Medical Research Council PT-1 trial,15 in which high-risk patients were randomly assigned to either hydroxyurea plus aspirin or to anagrelide plus aspirin; the National Cancer Research Institute study for intermediate-risk patients (no high-risk features and age 40-60), a randomization between aspirin alone or hydroxyurea plus aspirin; or the National Cancer Research Institute study for low-risk patients (no high-risk features and age < 40), a prospective observational study of patients receiving aspirin alone. Patients entered a higher risk study if they developed appropriate features. Follow-up procedures and definitions of end points have been detailed previously,15 but importantly all data were collected prospectively with more than 99% of patients having complete follow-up. All end-point events were validated prospectively by a central clinical committee without knowledge of treatment allocation, and all relevant histologic material from patients with myelofibrotic or leukemic transformation was reviewed by a histology committee. Events occurring before January 31, 2006, that were notified before June 30, 2006, were included in the analysis, meaning that the median follow-up for the cohort from trial entry was 68 months. The study protocol was approved by institutional ethics committees in all centers, and written informed consent was obtained from all patients in accordance with the Declaration of Helsinki.
Bone marrow trephine specimens
Bone marrow trephine biopsy specimens were requested from all patients enrolled in the 3 trials on patient registration. Although these were not a requirement of trial entry, 636 trephine specimens were received at St Thomas' Hospital from the 1022 patients enrolled before July 2005. For assessment, trephine biopsy sections were stained with hematoxylin and eosin (H&E) and Gordon and Sweets silver stain for reticulin. Staining was performed in a single laboratory for consistency.
Only bone marrow trephines taken at diagnosis were considered. Trephine biopsies that were embedded in resin without decalcification had morphology that could not be compared directly with most of the decalcified, wax-embedded specimens, and these were not included in the statistical analysis. Not all paraffin-embedded sections were of sufficient length or quality to enable all parameters to be assessed. The statistical analysis therefore included a core set of 370 trephine specimens more than 5 mm in length, for which all criteria could be assessed by all 3 hematopathologists.
Assessment of bone marrow trephine slides
Trephine sections were assessed by 3 hematopathologists, each with more than 10 years of consultant-level experience and a subspecialist interest in the myeloproliferative disorders. Consensus discussions were held to agree on the criteria for assessment (Table 1; Figure 1) and how to assess them. Each of the 3 hematopathologists assessed the sections independently and without knowledge of patient outcomes, with only the age and sex of the patient provided for each trephine specimen (to allow determination of cellularity). An overall diagnosis was made according to the WHO criteria and recorded on a 5-point scale: true ET (0), prefibrotic myelofibrosis (1), and manifest myelofibrosis of increasing severity (2-4). Overall cellularity and erythroid and granulocytic cellularity were scored as reduced (−1), normal (0), or increased (+1) relative to expectation for the patient's age. Megakaryocyte cellularity was scored as normal (0), mildly increased (+1), moderately increased (+2), or severely increased (+3). Individual features of megakaryocyte morphology were scored as absent (0), present (+1), or predominant (+2). These features were staghorn megakaryocytes, cloudlike megakaryocytes, dysplastic megakaryocytes, pyknotic megakaryocytes, and bare megakaryocytic nuclei. Megakaryocyte size was classed as predominantly small (0), mixed small and large (+1), and predominantly large (+2). Clustering of megakaryocytes was recorded as absent (0), loose (1), or tight (2), depending on assessment of the predominant pattern found. The size of clusters was recorded as no clusters (0), predominantly small clusters of fewer than 6 cells (1) or predominantly large clusters of at least 6 cells (2). In addition, the number of clusters was scored on a semiquantitative scale as absent (0), occasional (1), or predominant (2). New bone formation and presence of paratrabecular megakaryocytes were scored as absent (0) or present (+1). Finally, reticulin staining was scored using a scale from 0 to 4, whereby 0 was almost complete absence of fibers; 1 showed a few scattered fibers, predominantly around stromal vessels; 2 showed an incomplete meshwork of randomly orientated fibers with relatively few intersections; 3 showed a more dense and complete meshwork, still with randomly orientated fibers but with many intersections; and 4 showed denser meshwork still, with organization of fibers into parallel bands and areas within which organization of these parallel fibers into thicker bands was found.
Photographs of trephine specimens were taken on an Olympus BX51 microscope (Olympus, Watford, United Kingdom) equipped with 10× super widefield eyepieces and Olympus U-PlanApo 40×/0.85 NA and UPlanFL N 100×/1.30 NA objectives using a Pixera Pro150ES digital camera and Pixera Viewfinder image acquisition software v.3.0.1 (Egham, Surrey, United Kingdom).
Criterion . | Strength of association* . | 95% CI . |
---|---|---|
Myelofibrosis criteria (5-point scale) | ||
WHO diagnosis | 2.1 | 1.8-2.4 |
Reticulin grade | 5.1 | 4.0-6.4 |
Cellularity criteria (3-point scale) | ||
Overall | 8.3 | 5.5-12.6 |
Erythroid | 3.9 | 2.8-5.3 |
Granulocytic | 5.0 | 3.4-7.2 |
Megakaryocytic† | 5.1 | 4.0-6.4 |
Megakaryocyte morphology (3-point scale) | ||
Staghorn megakaryocytes | 2.5 | 1.9-3.3 |
Cloud-like megakaryocytes | 2.2 | 1.7-2.8 |
Dysplastic megakaryocytes | 1.4 | 1.1-1.7 |
Pyknotic megakaryocytes | 3.2 | 2.4-4.2 |
Bare megakaryocyte nuclei | 7.7 | 4.3-13.6 |
Megakaryocyte size | 3.2 | 2.3-4.5 |
Megakaryocyte clustering (3-point scale) | ||
Number of clusters | 9.1 | 6.0-13.9 |
Type of clusters (none, loose, tight) | 2.7 | 2.2-3.3 |
Size of clusters (none, small, large) | 6.8 | 4.7-9.7 |
Miscellaneous (2-point scale) | ||
Paratrabecular megakaryocytes | 4.4 | 3.1-6.1 |
New bone formation | 10.1 | 4.8-21.8 |
Criterion . | Strength of association* . | 95% CI . |
---|---|---|
Myelofibrosis criteria (5-point scale) | ||
WHO diagnosis | 2.1 | 1.8-2.4 |
Reticulin grade | 5.1 | 4.0-6.4 |
Cellularity criteria (3-point scale) | ||
Overall | 8.3 | 5.5-12.6 |
Erythroid | 3.9 | 2.8-5.3 |
Granulocytic | 5.0 | 3.4-7.2 |
Megakaryocytic† | 5.1 | 4.0-6.4 |
Megakaryocyte morphology (3-point scale) | ||
Staghorn megakaryocytes | 2.5 | 1.9-3.3 |
Cloud-like megakaryocytes | 2.2 | 1.7-2.8 |
Dysplastic megakaryocytes | 1.4 | 1.1-1.7 |
Pyknotic megakaryocytes | 3.2 | 2.4-4.2 |
Bare megakaryocyte nuclei | 7.7 | 4.3-13.6 |
Megakaryocyte size | 3.2 | 2.3-4.5 |
Megakaryocyte clustering (3-point scale) | ||
Number of clusters | 9.1 | 6.0-13.9 |
Type of clusters (none, loose, tight) | 2.7 | 2.2-3.3 |
Size of clusters (none, small, large) | 6.8 | 4.7-9.7 |
Miscellaneous (2-point scale) | ||
Paratrabecular megakaryocytes | 4.4 | 3.1-6.1 |
New bone formation | 10.1 | 4.8-21.8 |
*Higher scores for strength of association represent stronger interobserver reliability, with a score of 1 indicating no agreement beyond chance.
†The categories of normal (score = 0) and mildly increased (score = 1) were combined because of small numbers in the normal category and to allow comparison with the other criteria for cellularity (all on a 3-point scale).
Statistical analysis
Interobserver agreement for each of the individual morphologic criteria was assessed using log-linear modeling of the second-order marginal tables from the 3 pairwise comparisons among the 3 hematopathologists, as described.25 The model controlled for the marginal distributions of each of the pathologists and fitted a linear-by-linear association term to measure the strength of interobserver agreement. Because samples in different pairwise marginal tables are not truly independent, the jack-knife procedure was used to correct the point estimates and 95% confidence intervals of the parameters for this dependence in the data structure, as described.27 Under fairly general assumptions, the linear-by-linear association term is implied by a latent structure model, suggesting that estimates of the strength of association across criteria are comparable even if criteria are scored on different scales.
The independent predictors of WHO classification score for each of the 3 hematopathologists were analyzed by Bayesian proportional odds logistic regression. First, the hematopathologists provided subjective assessments of the importance of individual morphologic features by apportioning 100 points among the 16 criteria (Figure 3A). Then, to identify which combination of variables optimally predicted WHO classification score, the stochastic search variable selection method was used,26 with prior probabilities for each variable being included derived from the subjective assessment of weights (0.05 for variables given no weight by the pathologist, and 0.3 + 0.03 × (weight) for other variables, although the results were not sensitive to the priors used). The WHO classification score was assumed to follow a latent normally distributed variable, and data augmentation was used to estimate this underlying metric from the observed discrete response. A proportional odds probit model was estimated by constrained Gibbs sampling of the cut points, augmented data, and model parameters, as described.29 Finally, the model with the highest posterior probability was fitted to data scaled so that all predictors had the same range, to allow estimation of the relative contributions of each variable to the WHO score.
For clinical outcome, we used a composite clinical end point of time to first arterial or venous thrombosis; major hemorrhage; myelofibrotic, leukemic, or myelodysplastic transformation; or death. Comparison of end-point rates between prefibrotic myelofibrosis and true ET was performed using Kaplan-Meier analysis for univariate analysis. Cox proportional hazards modeling was used for multivariate analysis, with age, sex, treatment allocation (hydroxyurea, anagrelide, or intermediate/low-risk trial), prior cytoreductive therapy, and history of end-point events before trial entry added as covariates.
To apply exploratory factor analysis, we took the consensus score from the 3 hematopathologists for each of the 16 criteria on all 370 diagnostic bone marrow trephines. When there was disagreement, the median of the scores from the 3 hematopathologists was taken. Kaiser criterion (number of eigenvalues > 1) was used to determine the number of factors to fit, and the model was estimated using the varimax rotation.
Results
To establish the role of bone marrow histopathology in the diagnostic evaluation of a patient seeking treatment for thrombocytosis, bone marrow biopsy specimens were obtained from patients enrolled in 3 prospective trials of ET. Diagnostic trephine specimens (n = 370) were studied independently by 3 experienced hematopathologists for 16 morphologic criteria (Table 1; Figure 1). Each hematopathologist made an overall diagnosis from the trephine histology according to the WHO criteria.
Interobserver agreement when assessing morphologic features
Many measures of interobserver agreement, such as Cohen κ score, fail to correct for differences in the pattern of scores for each observer. We therefore used log-linear modeling of pairwise interobserver agreements,25 a well-established method that explicitly models the strength of association among the observers after correcting for the distribution of scores in the cohort studied. The method estimates the odds that if 2 observers score 2 trephine specimens in adjacent categories, then they agree on which biopsy is in which category. An estimate of 1 implies no agreement beyond chance, and the greater the estimate is above 1, the more agreement there is among the observers.
A number of interesting patterns emerged from these interobserver comparisons (Table 1). First, agreement on a single marker of marrow fibrosis, the reticulin grade (strength of association, 5.1; 95% CI, 4.0-6.4), was much greater than agreement on the WHO diagnosis (strength of association, 2.1; 95% CI, 1.8-2.4). The 3 hematopathologists agreed to within one grade of one another in 69% of cases when scoring reticulin, compared with 53% of cases when assigning WHO diagnosis (P < .001). Across the 3 hematopathologists, the frequency of patients with true ET ranged from 10% to 48%, prefibrotic myelofibrosis from 9% to 28%, and higher levels of fibrosis from 37% to 76%. Second, the strength of association for cellularity criteria was generally greater than that for megakaryocyte morphologic criteria, with the exception of bare megakaryocyte nuclei. In particular, agreement on the frequency of dysplastic megakaryocytes was poor, barely above chance, with agreement on staghorn and cloudlike megakaryocytes little better. Third, agreement for both the number of megakaryocyte clusters and the size of clusters was greater than for whether clusters were tight or loose. Fourth, interobserver agreement on the presence or absence of new bone formation was excellent. We considered the possibility that poor interobserver agreement reflected one discrepant observer. However, pairwise comparisons were similar for all pairs (data not shown), suggesting that differences in agreement shown in Table 1 were not due to an “outlier” observer.
In summary, agreement was better for measures of general morphologic patterns such as cellularity, number of clusters, and reticulin grade, and weaker for measures of individual cellular features such as megakaryocyte morphology and whether clustering is tight or loose. In addition, the hematopathologists showed poor agreement in synthesizing the various parameters when assigning cases to individual diagnostic categories using WHO criteria.
Relative importance of different morphologic criteria: significance of reticulin grade
The WHO monograph lists several histologic features that are said to be characteristic of true ET, such as the presence of staghorn megakaryocytes, normal overall cellularity, and loose megakaryocyte clustering. In contrast, prefibrotic myelofibrosis is said to be characterized by the presence of tight megakaryocyte clusters, cloudlike, dysplastic or pyknotic megakaryocytes, and abnormal cellularity. The poor interobserver agreement for the WHO classification that we have found could result from 2 potential causes, which are not mutually exclusive. The hematopathologists may not agree on the interpretation of the individual criteria themselves, as shown in the previous section. In addition, they may put differing emphasis or weight on the relative importance of the various morphologic criteria.
The WHO monograph provides minimal guidance as to the relative importance of the various morphologic features as individual contributions to reaching a diagnosis. We found this to be problematic because we not infrequently found examples of bone marrow histology with some of the morphologic features said to reflect true ET and coexistent with changes thought to imply prefibrotic myelofibrosis or even overt myelofibrosis. For example, there were sections with pyknotic megakaryocytes, a feature of myelofibrosis, in a loose megakaryocyte cluster and a normocellular background, both features said to be suggestive of true ET (Figure 2A). Similarly, we found biopsies with large numbers of staghorn megakaryocytes (a feature of true ET) together with hypercellularity (Figure 2B), cloudlike megakaryocytes (Figure 2C), and tight megakaryocyte clusters (Figure 2D), all features more suggestive of prefibrotic myelofibrosis. Of course, these examples do not mean that the overall patterns and associations of morphologic features described in the WHO monograph are invalid. In fact, many of the features did show significant correlations with one another, as shown in Table 2. However, although many of the correlations among individual morphologic features are statistically significant, they are far from fully concordant, suggesting that bone marrow biopsy specimens with conflicting features, such as those in Figure 2, are reasonably common. This underscores the difficulty of combining multiple morphologic features into a single diagnosis without explicit guidance as to which factors are most significant or characteristic.
. | WHO diagnosis . | Reticulin grade . | Overall cellularity . | Erythroid cellularity . | Granulocyte cellularity . | Mega cellularity . | Staghorn megas . | Cloudlike megas . | Dysplastic megas . | Pyknotic megas . | Bare mega nuclei . | Mega size . | Cluster numbers . | Cluster type . | Cluster size . | Paratrabec megas . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
WHO diagnosis | 1.0 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
Reticulin grade | 0.7 | 1.0 | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
Overall cellularity | 0.3 | 0.3 | 1.0 | — | — | — | — | — | — | — | — | — | — | — | — | — |
Erythroid cellularity | NS | NS | 0.3 | 1.0 | — | — | — | — | — | — | — | — | — | — | — | — |
Granulocyte cellularity | 0.2 | 0.3 | 0.6 | 0.2 | 1.0 | — | — | — | — | — | — | — | — | — | — | — |
Mega cellularity | 0.3 | 0.3 | 0.5 | 0.2 | 0.4 | 1.0 | — | — | — | — | — | — | — | — | — | — |
Staghorn megas | −0.3 | NS | NS | 0.1 | NS | 0.1 | 1.0 | — | — | — | — | — | — | — | — | — |
Cloudlike megas | NS | 0.1 | 0.2 | 0.2 | 0.2 | 0.3 | NS | 1.0 | — | — | — | — | — | — | — | — |
Dysplastic megas | 0.4 | 0.3 | 0.2 | 0.1 | 0.3 | 0.3 | −0.1 | 0.1 | 1.0 | — | — | — | — | — | — | — |
Pyknotic megas | 0.2 | 0.3 | 0.3 | 0.1 | 0.2 | 0.3 | NS | 0.2 | 0.3 | 1.0 | — | — | — | — | — | — |
Bare mega nuclei | 0.1 | 0.2 | 0.3 | NS | 0.1 | 0.3 | 0.1 | 0.2 | 0.3 | 0.6 | 1.0 | — | — | — | — | — |
Mega size | NS | 0.3 | 0.3 | NS | 0.2 | 0.4 | 0.3 | 0.2 | NS | 0.2 | 0.2 | 1.0 | — | — | — | — |
Cluster numbers | NS | 0.3 | 0.2 | NS | 0.2 | 0.4 | 0.2 | 0.2 | 0.1 | 0.3 | 0.2 | 0.4 | 1.0 | — | — | — |
Cluster type | 0.2 | 0.4 | 0.1 | NS | NS | 0.3 | NS | 0.1 | 0.2 | 0.2 | 0.2 | 0.4 | 0.7 | 1.0 | — | — |
Cluster size | 0.1 | 0.3 | 0.3 | NS | 0.2 | 0.6 | 0.2 | 0.2 | 0.2 | 0.3 | 0.2 | 0.5 | 0.7 | 0.6 | 1.0 | — |
Paratrabec megas | 0.2 | 0.4 | 0.2 | NS | 0.1 | 0.4 | NS | 0.1 | 0.2 | 0.2 | 0.1 | 0.3 | 0.3 | 0.4 | 0.4 | 1.0 |
New bone formation | 0.2 | 0.3 | NS | −0.1 | NS | 0.1 | NS | NS | NS | 0.2 | NS | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 |
. | WHO diagnosis . | Reticulin grade . | Overall cellularity . | Erythroid cellularity . | Granulocyte cellularity . | Mega cellularity . | Staghorn megas . | Cloudlike megas . | Dysplastic megas . | Pyknotic megas . | Bare mega nuclei . | Mega size . | Cluster numbers . | Cluster type . | Cluster size . | Paratrabec megas . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
WHO diagnosis | 1.0 | — | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
Reticulin grade | 0.7 | 1.0 | — | — | — | — | — | — | — | — | — | — | — | — | — | — |
Overall cellularity | 0.3 | 0.3 | 1.0 | — | — | — | — | — | — | — | — | — | — | — | — | — |
Erythroid cellularity | NS | NS | 0.3 | 1.0 | — | — | — | — | — | — | — | — | — | — | — | — |
Granulocyte cellularity | 0.2 | 0.3 | 0.6 | 0.2 | 1.0 | — | — | — | — | — | — | — | — | — | — | — |
Mega cellularity | 0.3 | 0.3 | 0.5 | 0.2 | 0.4 | 1.0 | — | — | — | — | — | — | — | — | — | — |
Staghorn megas | −0.3 | NS | NS | 0.1 | NS | 0.1 | 1.0 | — | — | — | — | — | — | — | — | — |
Cloudlike megas | NS | 0.1 | 0.2 | 0.2 | 0.2 | 0.3 | NS | 1.0 | — | — | — | — | — | — | — | — |
Dysplastic megas | 0.4 | 0.3 | 0.2 | 0.1 | 0.3 | 0.3 | −0.1 | 0.1 | 1.0 | — | — | — | — | — | — | — |
Pyknotic megas | 0.2 | 0.3 | 0.3 | 0.1 | 0.2 | 0.3 | NS | 0.2 | 0.3 | 1.0 | — | — | — | — | — | — |
Bare mega nuclei | 0.1 | 0.2 | 0.3 | NS | 0.1 | 0.3 | 0.1 | 0.2 | 0.3 | 0.6 | 1.0 | — | — | — | — | — |
Mega size | NS | 0.3 | 0.3 | NS | 0.2 | 0.4 | 0.3 | 0.2 | NS | 0.2 | 0.2 | 1.0 | — | — | — | — |
Cluster numbers | NS | 0.3 | 0.2 | NS | 0.2 | 0.4 | 0.2 | 0.2 | 0.1 | 0.3 | 0.2 | 0.4 | 1.0 | — | — | — |
Cluster type | 0.2 | 0.4 | 0.1 | NS | NS | 0.3 | NS | 0.1 | 0.2 | 0.2 | 0.2 | 0.4 | 0.7 | 1.0 | — | — |
Cluster size | 0.1 | 0.3 | 0.3 | NS | 0.2 | 0.6 | 0.2 | 0.2 | 0.2 | 0.3 | 0.2 | 0.5 | 0.7 | 0.6 | 1.0 | — |
Paratrabec megas | 0.2 | 0.4 | 0.2 | NS | 0.1 | 0.4 | NS | 0.1 | 0.2 | 0.2 | 0.1 | 0.3 | 0.3 | 0.4 | 0.4 | 1.0 |
New bone formation | 0.2 | 0.3 | NS | −0.1 | NS | 0.1 | NS | NS | NS | 0.2 | NS | 0.2 | 0.2 | 0.2 | 0.2 | 0.2 |
Mega indicates megakaryocyte; NS, not significant; and —, not applicable.
We therefore explored whether the 3 hematopathologists used different weighting schemes in using the 16 morphologic criteria to arrive at a final diagnosis according to the WHO monograph. Each hematopathologist independently generated subjective assessments of which morphologic features he or she found important in determining the overall WHO diagnosis by distributing an arbitrary 100 points among the 16 variables (Figure 3A). From this subjective assessment, 2 important points emerge. First, the 3 hematopathologists show different patterns of emphasis, with, for example, pathologists 1 and 3 putting more weight on cellularity criteria than pathologist 2, and pathologist 3 rating staghorn and dysplastic megakaryocytes as more important than do pathologists 1 and 2. Second, each of the 3 hematopathologists believed that reticulin grade was individually the most important criterion for determining the WHO classification score.
To provide a more objective assessment, a Bayesian proportional odds logistic regression was undertaken (Figure 3B). The reason for applying this method was to identify, for each hematopathologist, which factors were independently predictive of his or her WHO diagnosis. This showed that reticulin grade was the dominant independent predictor of WHO diagnosis for all 3 hematopathologists. Of the other criteria, there was little concordance as to which factors were independently informative. Many of the criteria that each pathologist identified as important for determining WHO diagnosis in the subjective weightings (Figure 3A) were not independently associated, after controlling for the correlation of WHO diagnosis with reticulin grade.
These results show that reticulin grade was the major factor determining WHO classification assignment by all 3 hematopathologists. Because interobserver agreement was quite high for reticulin grade, this suggests that the poor interobserver agreement for WHO diagnosis was largely driven either by differences in the interpretation of the other morphologic criteria, by the relative importance applied to them, or by both.
No difference in presenting blood counts or clinical outcome between true ET and prefibrotic myelofibrosis
One of the central claims of the WHO classification is that prefibrotic myelofibrosis and true ET are biologically distinct disorders with different prognoses. Because of the poor interobserver reliability with which these putative entities could be identified, the only means to assess the reproducibility of these claims in our cohort was to compare presenting blood counts and clinical outcomes between prefibrotic myelofibrosis and true ET for each hematopathologist separately.
There were no differences in hemoglobin level, platelet count, or white cell count at diagnosis between cases labeled as true ET and prefibrotic myelofibrosis for any of the hematopathologists (P > .1 for all 3 hematopathologists and each blood count variable). Similarly, there were no differences between true ET and prefibrotic myelofibrosis in age, sex, or rates of splenomegaly, leukoerythroblastic blood film, and cytogenetic abnormalities for any of the hematopathologists (P > .1 all variables). There was a weak association between JAK2 positivity and prefibrotic myelofibrosis for one of the pathologists (true ET 38% V617F-positive; prefibrotic myelofibrosis 61% V617F-positive; P = .04), but this was not found for the other 2 pathologists (P = .9 and P = .1). Given the large number of hypothesis tests performed in this section, this single significant test is likely to be due to the play of chance. Finally, we were unable to identify any distinguishing diagnostic clinical or laboratory features even when subsets of patients identified as prefibrotic myelofibrosis by any 2 or all 3 hematopathologists were considered (data not shown).
In total, 143 of the 370 patients were identified by at least one of the hematopathologists as having prefibrotic myelofibrosis (32 for hematopathologist 1, 101 for hematopathologist 2, and 39 for hematopathologist 3). Of these, with a median follow-up of 68 months from trial entry, not a single patient underwent myelofibrotic transformation. Moreover, only 1 of the 194 patients labeled as true ET by any of the hematopathologists (173 for hematopathologist 1, 36 for hematopathologist 2, and 40 for hematopathologist 3) transformed to myelofibrosis.
On univariate analysis, there was no difference for any of the 3 hematopathologists between prefibrotic myelofibrosis and true ET in the rate of the composite end point of time to first arterial or venous thrombosis, major hemorrhage, disease transformation, or death [hazard ratio (HR) for hematopathologist 1, 1.16; 95% CI, 0.4-3.2; P = .7. HR for hematopathologist 2, 0.61; 95% CI, 0.2-1.8; P = .4. HR for hematopathologist 3, 1.24; 95% CI, 0.39-3.9; P = .7). After controlling for age, sex, treatment allocation, prior cytoreductive therapy, and a history of previous end-point events, multivariate survival analysis similarly showed no differences in this composite end point between true ET and prefibrotic myelofibrosis (HR for hematopathologist 1, 1.03; 95% CI, 0.4-2.8; P = .9. HR for hematopathologist 2, 0.70; 95% CI, 0.2-2.0; P = .5. HR for hematopathologist 3, 0.48; 95% CI, 0.1-1.8; P = .3).
We next compared prefibrotic myelofibrosis and true ET for each of the individual end-point categories that comprised the composite end point, noting that numbers of events were generally low in these individual categories. There were no differences in overall survival between patients labeled as true ET and prefibrotic myelofibrosis for any of the 3 hematopathologists; furthermore, there were no differences in rates of arterial thrombosis, rates of venous thrombosis, or rates of major hemorrhage between patients labeled as true ET and those labeled as prefibrotic myelofibrosis (P > .1 for all 3 hematopathologists on univariate and multivariate analysis on each individual end point).
Exploratory factor analysis identifies megakaryocyte clustering, degree of fibrosis, and cellularity as independent underlying processes in ET
The practice of histopathology is, to a great extent, concerned with the recognition and description of patterns of morphologic features. Individual morphologic features of the marrow are of little intrinsic value in isolation, but they represent manifestations of underlying pathophysiologic factors. In our analysis, we found many significant correlations among the individual morphologic features (Table 2), suggesting that many features coexist and that there may be coherent patterns of histologic abnormalities. Factor analysis is a multivariate statistical method that seeks to identify these unobserved pathophysiologic processes by picking out patterns of correlations among the morphologic features that suggest common underlying, independent processes (or factors).30
Initial screening analysis suggested that a model with 3 independent factors was the most appropriate (by Kaiser criterion), and this was therefore fitted. The 5 most important morphologic features contributing to each factor are presented in Figure 4, and they suggest that the factors have relatively straightforward biologic interpretations. We start with the third factor, because it has the most straightforward interpretation. It is particularly weighted toward cellularity criteria, with emphasis on overall cellularity as well as the degree of hypercellularity for each of the 3 lineages separately. An association between JAK2 status and marrow cellularity has been shown previously in patients with ET, with V617F-positive ET showing greater overall, erythroid, and granulocytic cellularity than does V617F-negative ET.10 Consistent with this observation, we found a significant association between the JAK2 V617F mutation and scores for this cellularity factor (P < .001). In contrast, there were no significant associations between JAK2 status and the other 2 factors, discussed in the next paragraph (P = .1 for the clustering factor and P = .4 for the fibrosis factor).
Considering now the other 2 factors, the first is particularly weighted toward number, size, and type of megakaryocyte clusters together with megakaryocyte size and cellularity, and it captures an underlying process related to megakaryocyte clustering. The second independent factor appears to relate to the extent of fibrosis. Trephine specimens with extensive reticulin fibrosis, new bone formation, frequent pyknotic or dysplastic megakaryocytes, and bare megakaryocyte nuclei would score particularly highly on this factor. These are all features the WHO monograph identifies as suggestive of a diagnosis of overt myelofibrosis.
This factor analysis suggests that 3 underlying processes describe many of the morphologic patterns evident in the bone marrow trephine histology of patients whose disease is diagnosed as ET (by PVSG criteria), namely cellularity, megakaryocyte clustering, and extent of fibrosis. The cellularity factor shows significant correlation with JAK2 status, but the other 2 reflect unknown biologic mechanisms.
Discussion
ET has long been thought to represent a heterogeneous disorder, likely to contain pathogenetically distinct subgroups united by the lack of positive diagnostic markers. In keeping with this concept, histologic features of bone marrow in ET, as defined by PVSG criteria, show substantial variability. This has led to attempts to subclassify the disorder on the basis of trephine morphology,16,29,30 most recently in the WHO classification. Many claims have been made about the clinical use of such classifications, but generally there has been little detailed assessment of interobserver reliability or intercorrelations among variables, and the clinical validation has tended to be retrospective and uncorrected for other risk factors. In this study, we have evaluated the reproducibility with which individual histologic criteria can be assessed and contribute to the definition of subtypes of ET in a large, prospective, multicenter cohort of patients.
Our results show that several histologic features of the marrow can be ascertained with reasonable reproducibility, specifically those associated with marrow topography, cellularity, and degree of fibrosis (eg, reticulin grade, megakaryocyte clustering, new bone formation). However, the assessment of other cytologic features was much less reliable (particularly megakaryocyte morphology), as was classification according to WHO disease category, including the distinction between prefibrotic myelofibrosis and true ET. These data are consistent with at least 2 interpretations. It is possible that even experienced hematopathologists need special training to distinguish subtypes of ET. Our study was designed to assess the use of the WHO criteria in a “real world” pathology setting. Biopsies were assessed by experienced hematopathologists (but not directly involved in the development of the WHO criteria) working without a training set of slides, evaluating trephines from all patients in all ET risk categories. It remains possible that the pathologists involved in the WHO classification may have better reproducibility or that a training set of slides may have enhanced interobserver agreement for our hematopathologists. However, neither is available in routine diagnostic practice, and it is hard to see the general use of criteria, the application of which is so difficult even for experienced hematopathologists. Moreover, our results were obtained as part of a focused assessment in a finite period during which a large number of MPD trephine biopsy specimens were reviewed, a situation which is likely to enhance intraobserver reproducibility. Application of the criteria is likely to be significantly more difficult for most histopathologists who see such specimens relatively infrequently.
An alternative explanation for our results, and the one we favor, is that the current WHO histologic criteria are not sufficiently robust to define subtypes of ET. There was poor interobserver agreement on what is represented by the terms prefibrotic myelofibrosis and true ET, and there were striking differences in the emphasis each of the hematopathologists placed on different morphologic criteria when arriving at a diagnosis (Figure 3). These results show that the published histologic criteria for these proposed entities are difficult to apply in a reproducible manner. It has been suggested that patients labeled as having prefibrotic myelofibrosis have a worse outcome compared with those said to have true ET,19,,,,–24 a key argument supporting the existence of these putative entities. However, we have been unable to reproduce these findings. There were no differences in the rates of thrombosis, major hemorrhage, myelofibrotic transformation, or survival between prefibrotic myelofibrosis and true ET as labeled by any of the hematopathologists. We cannot exclude the possibility that prolonged follow-up or greater numbers of patients might reveal differences in outcome. However, if such large sample sizes are required to show statistical significance, such differences are unlikely to be clinically relevant.
Our favored interpretation is also consistent with recent molecular genetic insights. The subgroup of ET patients who carry the JAK2 V617F mutation are biologically distinct from those lacking the mutation, both in presenting features and in clinical outcome.10,31,–33 The JAK2 V617F–negative subgroup is also heterogeneous. An activating mutation in MPL occurs in approximately 10% of this subgroup,34 but the molecular mechanisms responsible for the rest remain unclear. However, there is no evidence for any correlation between the molecular subtypes of ET and the proposed histologic subtypes, true ET and prefibrotic myelofibrosis. In patients with ET, the JAK2 V617F mutation was associated with increased overall cellularity, increased erythropoiesis, and increased granulopoiesis, but there was no association between JAK2 status and reticulin grade, megakaryocyte clustering, or the presence of staghorn, cloudlike, dysplastic, or pyknotic megakaryocytes.10
The results presented here suggest that current histologic criteria are not sufficient to permit routine separation of ET into biologically distinct subsets. However, exploratory multivariate analysis did identify at least 3 independent processes underpinning the extensive variability of bone marrow histology in patients with thrombocythemia (Figure 4). One of these processes, the cellularity factor, correlates with whether the patient has the JAK2 V617F mutation, but the pathophysiology underlying the other 2 processes is less clear. The second factor, scoring highly on reticulin, new bone formation, and pyknotic, dysplastic, or bare megakaryocyte nuclei, is similar to the descriptions of overt myelofibrosis in the WHO monograph. The molecular mechanisms underlying the development of fibrosis and the other process identified by our factor analysis, megakaryocyte clustering, are unclear. TGF-β, NF-κB, low levels of GATA-1, and excessive MPL signaling have all been implicated in the development of fibrosis,35,,–38 and genetic or environmental factors influencing these pathways may influence the degree of fibrosis and associated morphologic features. Little is known about the molecular regulation of megakaryocyte location and clustering, but it may be relevant that megakaryocyte clusters are observed in mice treated with SDF-1, the ligand for the CXCR4 receptor.39
It is generally accepted that there is histologic heterogeneity within the group of patients labeled as ET with the use of the PVSG criteria. The molecular basis for this heterogeneity is unclear, and our data cast doubt on the concept of using current histologic criteria to divide ET into true ET and prefibrotic myelofibrosis. However, our results are consistent with a recent molecular classification of the MPDs1 in which it is suggested that reticulin accumulates to a variable extent in patients with ET (Figure 5). In this model, patients with both ET and PV gradually accumulate reticulin fibrosis as an inherent part of their disease, with the degree of fibrosis reflecting an interplay between the duration of the disease and physiologic or genetic modifiers. This concept is supported by histologic studies in patients with these disorders,40,41 and the frequent development of post-polycythemic myelofibrosis in mouse models.42,43 The development of fibrosis is likely to be influenced by inherited genetic modifiers (as evidenced by differences in the rates of myelofibrosis among different strains of mice expressing JAK2 V617F43 ), environmental factors and acquired genetic44 or epigenetic45 changes. In a proportion of patients the accumulation of these genetic or epigenetic results in an acceleration of their disease which may present as myelofibrotic transformation. Several observations are consistent with the model shown in Figure 5. V617F-positive patients with PV and ET share several features,10,31,32 and they represent a phenotypic continuum, with homozygosity for the V617F mutation strongly favoring a polycythemic phenotype.46 Patients labeled as primary myelofibrosis are clinically indistinguishable from those with myelofibrotic transformation of a preceding MPD.47 The model also predicts that the genetic lesions responsible for V617F-negative primary myelofibrosis will be found in V617F-negative ET, as has been found for MPL W515 mutations.34,48
An Inside Blood analysis of this article appears at the front of this issue.
The publication costs of this article were defrayed in part by page charge payment. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.
Acknowledgments
We thank Jon van der Walt for his assistance with curation of the biopsy samples and organizing the cutting and staining of specimens, and Pat Collins for her assistance with managing the trephine slides.
This work was supported by grants from the United Kingdom Medical Research Council, the Leukemia Research Fund, and the Kay Kendall Leukemia Fund.
Authorship
Contribution: B.S.W., W.N.E., and D.B. contributed equally to the design of the study, the interpretation and scoring of the bone marrow trephine biopsies, and the subsequent analysis. G.B. and C.L.E. collected clinical outcome data and samples from patients in the 3 trials, under the oversight of K.W., C.N.H., and A.R.G. B.P. contributed to collection of patient samples. P.J.C. performed statistical analyses. A.R.G. and P.J.C. equally coordinated and directed the research. All authors have had the opportunity to contribute to the drafting of the paper.
Conflict-of-interest disclosure: The authors declare no competing financial interests.
B.S.W., W.N.E., and D.B. are equal first authors. A.R.G. and P.J.C. are equal last authors.
Correspondence: Anthony R. Green, Department of Haematology, Cambridge Institute for Medical Research, University of Cambridge, Hills Rd, Cambridge CB2 2XY, United Kingdom; e-mail: [email protected]; or Bridget S. Wilkins, Northern Institute for Cancer Research, Paul O'Gorman Bldg, Framlington Place, Newcastle University, Newcastle-upon-Tyne NE2 4HH, United Kingdom; e-mail: [email protected].
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal