To the Editor:
The International Bone Marrow Transplant Registry (IBMTR) has recently adopted a new severity index for grading acute graft-versus-host disease (GVHD) after allogeneic marrow transplantation. This index is based on the results of an analysis in which distinct patterns in the peak severity of GVHD involvement in the skin, liver, and gut were correlated with the risks of transplant-related mortality (TRM) and treatment failure (defined as relapse or death) among 2,129 adults who received an unmodified marrow graft from an HLA-identical sibling with the use of methotrexate and cyclosporine for GVHD prophylaxis.1 In this analysis, acute GVHD was entered into Cox proportional hazards regression models as a time-dependent variable so that all patients were categorized as not having GVHD until they developed the disease. As a further refinement, the analysis was stratified across two age categories and three pretransplant risk categories. Results showed that patients with Glucksberg grade I GVHD did not have a significantly increased risk of TRM compared to those with grade 0 GVHD (relative risk = 1.12), while the relative risks for patients with Glucksberg grades II and III-IV GVHD were 2.19 and 5.33, respectively. Patients with IBMTR Index of A did not have an increased risk of TRM (relative risk = 0.84), while the relative risks for patients with IBMTR Indices of B, C, and D were 1.9, 4.34, and 11.9, respectively. The pooled relative risk for patients with IBMTR Indices of C or D was not reported.
The IBMTR Index was designed to avoid the need for subjective assessment of performance status which has been included as an element in the Glucksberg scale. In practice, performance status is used in the Glucksberg grading system only to distinguish between grades III and IV GVHD. Use of the term “extreme” to describe the reduction in performance status associated with grade IV GVHD has been interpreted as a fatal outcome related to GVHD. The IBMTR Index was also designed to provide greater homogeneity in the risks of TRM and treatment failure among patients with GVHD of any given degree of severity. However, from data in Tables 3 and 7 of the report by Rowlings et al,1 it can be calculated that the average standard error of the parameter estimates for GVHD grade associated with the Cox model for TRM was 0.124 when the Glucksberg system was used and 0.142 when the IBMTR Index was used. By this criterion, there is no evidence that scaling the severity of GVHD according to the IBMTR Index improves the homogeneity in risk of TRM within grades.
We have compared Glucksberg GVHD grades and the corresponding IBMTR Indices for 838 adults and children who received an allogeneic marrow transplant from a related or unrelated donor for treatment of CML in chronic phase or acute leukemia in remission (Table 1). Most patients received methotrexate and cyclosporine for GVHD prophylaxis. Transplants were performed between 1990 and 1997, and all GVHD grading was done by a single individual according to published criteria.2 Patients with advanced malignancy were excluded so that correlations between GVHD severity and TRM would be more apparent.
|Glucksberg Grade .||IBMTR Index .|
|0 .||A .||B .||C .||D .||Total .|
|Glucksberg Grade .||IBMTR Index .|
|0 .||A .||B .||C .||D .||Total .|
As shown in Table 1, the IBMTR Index tends to assign a higher overall grade for GVHD severity than the Glucksberg categorization. Patients with Glucksberg grade I GVHD were categorized as having an IBMTR Index of A or B, depending on whether the maximum extent of rash involved less than 25% or 25% to 50% of body surface area. Patients with Glucksberg grade II GVHD were categorized as having an IBMTR Index of B or C, depending on whether the maximum extent of rash involved 25% to 50% or greater than 50% of body surface area. Patients with Glucksberg grade III GVHD were categorized as having an IBMTR Index of C or D, depending on the absence or presence of stage 4 involvement in at least one organ. Among 245 patients with Glucksberg grade III GVHD, 25 were categorized as having an IBMTR Index of B because the maximum extent of rash did not exceed 50% of body surface area. These are the only cases in which the grade assigned by the IBMTR Index was lower than the Glucksberg grade.
Correlations between the cumulative incidence of TRM and GVHD severity as assessed by the Glucksberg grade or by the IBMTR Index were similar (Fig 1). Patients with grade 0 GVHD had a higher incidence of TRM than those with Glucksberg grades of I or II or with IBMTR Indices of A or B because many “grade 0” patients did not survive long enough to develop acute GVHD. The cumulative incidence of TRM was similar for patients with Glucksberg grades of I or II or with IBMTR Indices of A or B, but analysis of the IBMTR Index A group is limited by small numbers. The cumulative incidence estimates of TRM for patients with Glucksberg grades of III and IV were respectively higher than with IBMTR Indices of C or D, as would be expected from results in the table. The difference in TRM between grades I-II GVHD and grade III GVHD assigned by the Glucksberg categorization was greater than the corresponding difference between the IBMTR Index A-B and C groups. These results do not show any advantage for the IBMTR Index compared with Glucksberg grading in the patient population we analyzed.
Our discussion of the IBMTR Index should not be interpreted as an uncritical defense of the Glucksberg grading criteria. We acknowledge that this historically accepted approach has many well-known limitations. For example, we have found considerable disagreement between reviewers in categorizing the peak severity of GVHD involvement in the skin, liver, and gut.3 As a result, the reproducibility of retrospective grading by different reviewers was poor. In an attempt to overcome these difficulties, we have recently proposed an alternative approach for GVHD grading to be considered by the marrow transplant community.3 Instead of emphasizing the peak severity of abnormalities in the skin, liver, and gut, this alternative approach summarizes the overall clinical course as reflected by the progression of GVHD and the response to treatment. Results of testing suggested that this alternative approach might yield better reproducibility than the original Glucksberg system. We look forward to discussion of further improvements in the future.
We appreciate the opportunity to address the issues raised by our comparison of the prognostic utility of the Glucksberg grading system of acute graft-versus-host disease (GVHD) with the International Bone Marrow Transplant Registry (IBMTR) index.1-1 Because the report in question was not published in Blood, for the bene fit of readers we should point out the purpose of the IBMTR analysis was to determine if there was heterogeneity of outcome within each Glucksberg grade and, if so, to define more homogeneous grades of GVHD so that outcome for an individual patient could be estimated with greater accuracy. We sought, additionally, to use the objective Glucksberg staging criteria for skin, liver, and gut involvement without relying on the subjective assessment of performance score. Therefore, the IBMTR is a refinement rather than an abandonment of the Glucksberg system.
Although we welcome the review of our analysis by Martin et al, we perceive some methodological flaws both in their interpretation of our analysis and in the analysis of their own data. As they correctly note, the average standard errors for the IBMTR index model are larger than those for the Glucksberg grade model. This fact does not imply that the Glucksberg model fits the data better, but rather is an artifact of the IBMTR model having more parameters (4) than the model based on the Glucksberg grading (3 parameters). It is well known that the standard errors of parameter estimates do not depend on the strength of the relationship between the independent and dependent variables. The magnitude of the standard errors is affected by the total sample size and the number of parameters in the model. Interestingly, Martin and colleagues do not provide any standard errors for their own data.
The correct measure of how well different models fit is some type of information criterion. For regular regression, the standard measure is the square of the correlation, R2. For the Cox model, the Akaike information criterion (AIC)1-2,1-3 can be used to compare information in non-nested models. This criterion is defined as AIC = −2 ln[L] + 2p where p is the number of parameters in the model and L is the value of the maximized partial likelihood for the fitted model. When comparing two models, the model with smaller AIC provides a better, more parsimonious fit to the data. For the model in Table 3 (Glucksberg Grade) the AIC is 5924.781 while for the data in Table 7 (IBMTR index) the AIC is 5813.258. This strongly suggests that the model in Table 7 provides a superior fit to the data.
Additionally, to compare the prognostic ability solely on the basis of Kaplan-Meier curves is misleading. The computation of a Kaplan-Meier curve, as depicted in the figures, requires that the GVHD grade of every patient be known at the time of transplantation or that a left truncated Kaplan-Meier1-3 estimator be used. This is clearly not the case for the patients in Martin et al’s cohort, especially for grade IV patients who must die before being assigned to this group. With violations of this assumption, the basic statistical theory1-4,1-5 used to derive the curves breaks down. In our analysis we analyzed the prognostic ability of grades of GVHD using a series of time-dependent covariates and presented survival curves as summary statistics to illustrate our findings. Outcome (ie, death) was not used to assign grade.
Martin et al comment that early deaths before occurrence of acute GVHD may be affecting interpretation of their data. We agree. We purposefully only included patients surviving ≥21 days with engraftment in our study to aid in interpreting the effect of acute GVHD on outcomes.
Heterogeneity within Glucksberg grade is clearly seen in Table 1 of the letter from Martin et al. Glucksberg grade II patients can be in either IBMTR Index B or C, groups with distinctly different outcomes. As was shown in our study this is largely due to the adverse prognosis of developing stage 3 skin GVHD, important information not considered in the Glucksberg grade.
Ongoing research in this area requires a grading system based on objective data that can be used prospectively to accurately assess risk. Useful prognostic systems predict outcome rather than describing it afterward, thereby allowing high-risk patients to be identified and targeted for intervention. Such a system would benefit both design and conduct of GVHD treatment trials as well as immediate patient care. Whether the IBMTR Index fulfills these requirements awaits the results from prospective validation studies, as is currently being undertaken by the French Society of Bone Marrow Transplantation in collaboration with the IBMTR Statistical Center.