Key Points

  • NCCN-IPI better discriminates groups of patients with different survival than do IPI/R-IPI in those with DLBCL receiving R-CHOP.

  • Molecular characteristics of tumor and microenvironment may help to identify patients with OS clearly <50%.

Abstract

Great heterogeneity in survival exists for patients newly diagnosed with diffuse large B-cell lymphoma (DLBCL). Three scoring systems incorporating simple clinical parameters (age, lactate dehydrogenase, number/sites of involvement, stage, performance status) are widely used: the International Prognostic Index (IPI), revised IPI (R-IPI), and National Comprehensive Cancer Network IPI (NCCN-IPI). We evaluated 2124 DLBCL patients treated from 1998 to 2009 with frontline rituximab plus cyclophosphamide, doxorubicin, vincristine, and prednisone (R-CHOP; or variant) across 7 multicenter randomized clinical trials to determine which scoring system best discriminates overall survival (OS). Median age was 63 years, and 56% of patients were male. Five-year OS estimates ranged from 54% to 88%, from 61% to 93%, and from 49% to 92% using the IPI, R-IPI, and NCCN-IPI, respectively. The NCCN-IPI had the greatest absolute difference in OS estimates between the highest- and lowest-risk groups and best discriminated OS (concordance index = 0.632 vs 0.626 [IPI] vs 0.590 [R-IPI]). For each given IPI risk category, NCCN-IPI risk categories were significantly associated with OS (P ≤ .01); the reverse was not true, and the IPI did not provide additional significant prognostic information within all NCCN-IPI risk categories. Collectively, the NCCN-IPI outperformed the IPI and R-IPI. Patients with low-risk NCCN-IPI had favorable survival outcomes with little room for further improvement. In the rituximab era, none of the clinical risk scores identified a patient subgroup with long-term survival clearly <50%. Integrating molecular features of the tumor and microenvironment into the NCCN-IPI or IPI might better characterize a high-risk group for which novel treatment approaches are most needed.

Introduction

Great heterogeneity in survival exists for patients newly diagnosed with diffuse large B-cell lymphoma (DLBCL), the most common type of B-cell lymphoma. Clinical scoring systems have been developed to better risk-stratify patients and assist in the selection of therapeutic strategies. The International Prognostic Index (IPI) scoring system was first published >25 years ago and remains widely used today.1  The IPI score assigns 1 point to each negative prognostic factor (age >60 years, serum lactate dehydrogenase [LDH] above the upper limit of normal [ULN], Ann Arbor stage III/IV disease, Eastern Cooperative Oncology Group [ECOG] performance status ≥2, and >1 site with extranodal involvement) and categorizes patients into 4 risk groups based on the total score: 0/1 = low risk, 2 = low-intermediate risk, 3 = high-intermediate risk, and 4/5 = high risk. The IPI was developed at a time when patients received chemotherapy-only regimens, primarily cyclophosphamide, doxorubicin, vincristine, and prednisone (CHOP), and 5-year overall survival (OS) estimates ranged from 26% to 73%, depending on the risk category.1 

The addition of rituximab to cyclophosphamide, doxorubicin, vincristine, and prednisone (R-CHOP) changed the clinical history of DLBCL, significantly extending survival and increasing the patient cure rate. The revised IPI (R-IPI) was developed specifically for newly diagnosed patients with DLBCL to better risk-stratify those who were treated with R-CHOP.2  The R-IPI used the same risk factors and scoring system as the IPI, but it redistributed the scores to form 3 risk groups: 0 = very good risk, 1/2 = good risk, and 3/4/5 = poor risk. The R-IPI was reported to be less complex than the IPI, and it performed better than the IPI in terms of distinguishing patients with more- or less-favorable long-term outcomes2 ; however, its superior performance has been disputed methodologically and clinically.3,4 

To improve upon the IPI and the R-IPI, especially aiming to identify a subgroup of patients with 5-year OS <50%, the National Comprehensive Cancer Network (NCCN) database was used to develop an enhanced NCCN-IPI score for newly diagnosed patients with DLBCL and treated with R-CHOP.5  Again, the clinical risk factors used for the NCCN-IPI were largely the same as for the IPI/R-IPI, but the NCCN-IPI reevaluated the simple classifications of each risk factor. Ultimately, age was categorized into 4 groups (>75 years = 3 points, >60 years = 2 points, >40 years = 1 point, and ≤40 years = 0 points), and LDH was classified into 3 groups (>3 times the ULN = 2 points, >1 time the ULN = 1 point, and ≤1 time the ULN = 0 points). Ann Arbor stage III/IV disease and ECOG performance status ≥2 still contributed 1 point. Instead of using the number of extranodal sites, the presence of extranodal disease in the bone marrow, central nervous system, liver/gastrointestinal tract, or lung contributed 1 point to the NCCN-IPI. The total score for the NCCN-IPI ranges from 0 to 8 points: 0/1 = low risk, 2/3 = low-intermediate risk, 4/5 = high-intermediate risk, and 6/7/8 = high risk. Developers of the NCCN-IPI showed that 5-year OS rates according to the risk groups ranged from 33% to 96% and performed better than the IPI in a training dataset and in an external population-based validation dataset; a comparison with R-IPI was not done.5  The better performance of the NCCN-IPI over the IPI was further supported by 2 smaller studies: 1 retrospective cohort study including patients across 2 centers in Austria and 1 retrospective cohort study including a Chinese patient population.6,7 

In 2020, we find ourselves with 3 commonly used clinical scoring systems for patients with newly diagnosed DLBCL: the IPI, the R-IPI, and the NCCN-IPI. Our objective was to compare the performance of all 3 clinical scoring systems, in terms of discriminating the most important end point (OS), using an independent large clinical database. The international Surrogate Endpoints for Aggressive Lymphoma (SEAL) database,8  which includes data from patients enrolled in multicenter randomized clinical trials, provided a unique resource via which this objective could be addressed. The performance of the 3 clinical scoring systems was also compared for important secondary end points, including progression-free survival (PFS) and PFS at 24 months (PFS24). Herein, we present results from 2124 patients diagnosed with DLBCL and treated with R-CHOP or variants as frontline therapy.

Methods

Patients and treatment regimens

Patient data from 7 of 14 multicenter randomized clinical trials (LNH03-2B,9  LNH03-6B,10  LNH985,11  MegaCHOEP,12  MinT,13  RICOVER-60,14  and UCL15 ), collected as part of the international SEAL database,8  were included in this analysis. Trials were excluded if they did not randomize patients to frontline therapy with an R-CHOP–based regimen (n = 3) or if data on major organ sites of involvement were not collected as part of the trial, precluding calculation of the NCCN-IPI (n = 4). Of the 3353 patients enrolled in the 7 trials meeting inclusion criteria, 560 patients (16.7%) did not have centrally confirmed DLBCL and were excluded. The IPI/R-IPI could not be calculated for 301 patients (10.8%), primarily as the result of missing data for LDH, whereas the NCCN-IPI could not be calculated for 368 patients (13.2%), primarily as the result of missing data for 1 site of major organ involvement. Therefore, the remaining 2124 patients had newly diagnosed DLBCL with data available to calculate all 3 clinical scores of interest (Figure 1). All patients received R-CHOP or a variant as induction therapy in 1 of the 7 trials from 1998 to 2009. Induction regimens included R-CHOP administered every 14 or 21 days over 6 to 8 cycles (LHN03-2B, n = 132; LNH03-6B, n = 441; LNH985, n = 98; RICOVER-60, n = 474; UCL, n = 397), 6 cycles of an R-CHOP–like therapy (MinT, n = 380), or 8 cycles of R-CHOP or variant with etoposide (MegaCHOEP, n = 202).

Figure 1.

Flow diagram.

Figure 1.

Flow diagram.

Statistical analysis

OS, PFS, and PFS24 have been defined previously.8  Estimates of OS and PFS were obtained by the Kaplan-Meier method and compared between risk groups using the 2-sided log-rank test.16  Stratified Cox proportional hazard models, using trial and type of induction therapy as stratification factors, were fit to obtain adjusted hazard ratios with 95% confidence intervals (CIs).17  Stratified models for each of the risk scores were compared using Akaike’s information criterion (AIC) and the concordance index (C-index).18-21  The AIC provided a relative measure of model quality; smaller values correspond with a better fitting model. As a general guideline, differences in AIC <2 between models indicate no improvement in fit, differences >2 but <10 indicate increasing improvement in fit, and differences ≥10 indicate substantial improvement in the fit of the model.22  The relative likelihood for comparing 2 models is exp(−[AIC1 − AIC2]/2), where AIC1 is ≥AIC2. Based on this, if AIC1 − AIC2 is >10, the chance that model 1 is the better model is very small. The C-index provided a measure of predictive ability of the model, defined as the probability of concordance between predicted and observed survival. The C-index corresponds to the area under the receiver operating characteristics curve for censored data. C-index values of 0.5, 0.7, and 1.0 indicate that the model has completely random, acceptable, or perfect discrimination, respectively, between short and long survival times. All analyses were conducted using SAS 9.4 or R v3.4.2.

Results

Patient characteristics

A total of 2124 patients diagnosed with centrally confirmed DLBCL, enrolled in 1 of 7 multicenter trials from 1998 to 2009 and treated frontline with rituximab-based induction therapy, was included in this study. Patient characteristics are provided in Table 1. More than half of the patients were male (56%). Median age was 63 years (range, 18-83), and 56% were older than 60 years. The majority of patients had an ECOG performance status ≤1 (84%) and Ann Arbor stage III/IV disease (62%). Extranodal involvement in >1 site was documented in 22% of patients. LDH was elevated in 59% of patients, with 10% of patients having LDH levels >3 times the ULN. As induction therapy, 90% of patients received R-CHOP or a variant of R-CHOP that did not include etoposide, whereas 10% received R-CHOP or a variant that did include etoposide.

Table 1.

Baseline patient characteristics (N = 2124)

CharacteristicData
Age, median (range), y 63 (18-83) 
Age, y  
 ≤40 276 (13) 
 41-60 650 (31) 
 61-75 1026 (48) 
 >75 172 (8) 
Males 1198 (56) 
Ann Arbor stage III/IV 1321 (62) 
ECOG performance status ≥ 2 334 (16) 
LDH ratio  
 <1× ULN 877 (41) 
 1-3× ULN 1031 (49) 
 >3× ULN 216 (10) 
Extranodal sites > 1 469 (22) 
Extranodal involvement of major organ* 811 (38) 
IPI risk group  
 Low (0-1) 730 (34) 
 Low-intermediate (2) 480 (23) 
 High-intermediate (3) 486 (23) 
 High (4-5) 428 (20) 
R-IPI risk group  
 Very good (0) 183 (9) 
 Good (1-2) 1027 (48) 
 Poor (3-5) 914 (43) 
NCCN-IPI risk group  
 Low (0-1) 281 (13) 
 Low-intermediate (2-3) 864 (41) 
 High-intermediate (4-5) 764 (36) 
 High (6-8) 215 (10) 
Study  
 LNH03-2B9  132 (6) 
 LNH03-6B10  441 (21) 
 LNH98511  98 (5) 
 MegaCHOEP12  202 (10) 
 MinT13  380 (18) 
 RICOVER-6014  474 (22) 
 UCL15  397 (19) 
Induction regimen  
 R-CHOP 1542 (73) 
 CHOP-like + R 380 (18) 
 R-CHOEP 101 (5) 
 R-MegaCHOEP 101 (5) 
CharacteristicData
Age, median (range), y 63 (18-83) 
Age, y  
 ≤40 276 (13) 
 41-60 650 (31) 
 61-75 1026 (48) 
 >75 172 (8) 
Males 1198 (56) 
Ann Arbor stage III/IV 1321 (62) 
ECOG performance status ≥ 2 334 (16) 
LDH ratio  
 <1× ULN 877 (41) 
 1-3× ULN 1031 (49) 
 >3× ULN 216 (10) 
Extranodal sites > 1 469 (22) 
Extranodal involvement of major organ* 811 (38) 
IPI risk group  
 Low (0-1) 730 (34) 
 Low-intermediate (2) 480 (23) 
 High-intermediate (3) 486 (23) 
 High (4-5) 428 (20) 
R-IPI risk group  
 Very good (0) 183 (9) 
 Good (1-2) 1027 (48) 
 Poor (3-5) 914 (43) 
NCCN-IPI risk group  
 Low (0-1) 281 (13) 
 Low-intermediate (2-3) 864 (41) 
 High-intermediate (4-5) 764 (36) 
 High (6-8) 215 (10) 
Study  
 LNH03-2B9  132 (6) 
 LNH03-6B10  441 (21) 
 LNH98511  98 (5) 
 MegaCHOEP12  202 (10) 
 MinT13  380 (18) 
 RICOVER-6014  474 (22) 
 UCL15  397 (19) 
Induction regimen  
 R-CHOP 1542 (73) 
 CHOP-like + R 380 (18) 
 R-CHOEP 101 (5) 
 R-MegaCHOEP 101 (5) 

Unless otherwise indicated, data are n (%).

R-CHOEP, rituximab plus cyclophosphamide, doxorubicin, vincristine, etoposide, and prednisone; R-MegaCHOEP, R-CHOEP with escalated doses of cyclophosphamide, etoposide, and doxorubicin.

*

Involvement of the bone marrow, central nervous system, liver or gastrointestinal system, or lung.

R-CHOP was administered every 14 or 21 days over 6 to 8 cycles,9-11,14,15  CHOP-like + R was administered over 6 cycles,13  and R-CHOEP or R-MegaCHOEP was administered over 8 cycles.12 

Clinical scoring systems

All patients were categorized into groups based on all 3 clinical risk scoring systems. According to IPI, 34% were low risk, 23% were low-intermediate risk, 23% were high-intermediate risk, and 20% were high risk. In the updated R-IPI score proposed specifically for patients treated with rituximab, 9% were very good risk, 48% were good risk, and 43% were poor risk. In the most recent NCCN-IPI scoring system, 13% were low risk, 41% were low-intermediate risk, 36% were high-intermediate risk, and 10% were high risk. By design, patients in the high-intermediate or high IPI risk groups are in the poor R-IPI risk group, patients in the low-intermediate IPI risk group are in the good R-IPI risk group, and patients in the low IPI risk group are split between the good and very good R-IPI risk groups (75% and 25%, respectively). The agreement between the IPI and NCCN-IPI classifications was moderate (weighted κ = 0.61); 55% of patients were in the same risk category using IPI and NCCN-IPI, whereas 44% were in adjacent risk categories. Risk classifications were largely different in only 7 patients (<1%; 4 classified as low risk by IPI and high-intermediate risk by NCCN-IPI and 3 classified as high risk by IPI and low-intermediate risk by NCCN-IPI).

Overall survival

With a median follow-up of 4.9 years and a maximum follow-up of 10.7 years, there have been 559 deaths, and the median OS has not been reached; the 5-year OS estimate for the population was 73% (95% CI, 71-75). As shown in Figure 2, each of the 3 clinical risk scoring systems resulted in risk groups with significantly different OS (P < .0001 for each). Using IPI, R-IPI, and NCCN-IPI, 5-year OS estimates ranged from 54% to 88%, from 61% to 93%, and from 49% to 92%, respectively. Among the 3 clinical risk scoring systems, the NCCN-IPI had the greatest absolute difference in 5-year OS estimates between the highest- and lowest-risk groups.

Figure 2.

Overall survival for risk groups defined by 3 clinical scoring systems. (A) IPI, (B) R-IPI, and (C) NCCN-IPI.

Figure 2.

Overall survival for risk groups defined by 3 clinical scoring systems. (A) IPI, (B) R-IPI, and (C) NCCN-IPI.

Compared with the IPI, the R-IPI better identified a subgroup of patients with favorable long-term survival, but created a poor-risk category containing patients with more heterogeneous outcomes than with the IPI. The NCCN-IPI retained the ability of the R-IPI to distinguish a subgroup of patients with favorable long-term survival, while improving upon the IPI by identifying a less-heterogeneous high-risk group.

In models stratified by study and type of induction therapy received, the NCCN-IPI provided the best fit for the data, followed by the IPI and then the R-IPI (indicated by the lowest AIC value: 5428 vs 5455 vs 5492, respectively; Table 2). The relationship between individual risk factors making up each of the 3 risk scores and OS are provided in supplemental Table 1 (available on the Blood Web site). The NCCN-IPI also discriminated best between patients with poor and favorable OS (indicated by the highest C-index: 0.632, 0.626, and 0.590, respectively, for models with NCCN-IPI, IPI, and R-IPI risk categories; Table 2). According to the NCCN-IPI, the risk of death was estimated as 6.40 times higher for patients classified as high risk (95% CI, 3.45-11.89) compared with those with a classification of low risk, patients classified as high-intermediate risk had an estimated risk of death that was 3.95 times higher (95% CI, 2.17-7.19) than those with low risk, and patients classified as low-intermediate risk had a risk of death that was 1.49 times higher (95% CI, 0.83-2.67) than those with low risk (Table 2). Similar results were observed in the subgroup of patients with confirmed DLBCL who received R-CHOP therapy (supplemental Table 2).

Table 2.

Stratified models for OS

ModelHazard Ratio (95% CI)AIC*C-index (95% CI)
IPI risk group  5455 0.626 (0.557-0.694) 
 Low (0-1) Reference 
 Low-intermediate (2) 1.99 (1.46-2.72) 
 High-intermediate (3) 2.73 (1.99-3.74) 
 High (4-5) 4.51 (3.29-6.16) 
R-IPI risk group  5492 0.590 (0.528-0.652) 
 Very good (0) Reference 
 Good (1-2) 1.68 (0.81-3.48) 
 Poor (3-5) 3.67 (1.75-7.67) 
NCCN-IPI risk group  5428 0.632 (0.565-0.700) 
 Low (0-1) Reference 
 Low-intermediate (2-3) 1.49 (0.83-2.67) 
 High-intermediate (4-5) 3.95 (2.17-7.19) 
 High (6-8) 6.40 (3.45-11.89) 
ModelHazard Ratio (95% CI)AIC*C-index (95% CI)
IPI risk group  5455 0.626 (0.557-0.694) 
 Low (0-1) Reference 
 Low-intermediate (2) 1.99 (1.46-2.72) 
 High-intermediate (3) 2.73 (1.99-3.74) 
 High (4-5) 4.51 (3.29-6.16) 
R-IPI risk group  5492 0.590 (0.528-0.652) 
 Very good (0) Reference 
 Good (1-2) 1.68 (0.81-3.48) 
 Poor (3-5) 3.67 (1.75-7.67) 
NCCN-IPI risk group  5428 0.632 (0.565-0.700) 
 Low (0-1) Reference 
 Low-intermediate (2-3) 1.49 (0.83-2.67) 
 High-intermediate (4-5) 3.95 (2.17-7.19) 
 High (6-8) 6.40 (3.45-11.89) 

Models were stratified by study and type of induction therapy.

*

The AIC provided a relative measure of model quality; smaller values correspond with a better fitting model. Differences in AIC <2 between models indicate no improvement in fit, differences >2 but <10 indicate increasing improvement in fit, and differences ≥10 indicate substantial improvement in the fit of the model.

The C-index provided a measure of predictive ability of the model, defined as the probability of concordance between predicted and observed survival. The C-index corresponds to the area under the receiver operating characteristics curve for censored data. C-index values of 0.5, 0.7, and 1.0 indicate that the model has completely random, acceptable, or perfect discrimination, respectively, between short and long survival times.

Although the AIC measures provided very strong evidence that the model with NCCN-IPI had the best fit to the observed data, the differences in the C-index were small. For this reason, we also examined the impact of NCCN-IPI score when controlling for each level of the IPI. At each level of the IPI, the NCCN-IPI provided significant prognostic information for OS (P < .0001, P = .006, P = .0003, and P = .01 for low risk IPI, low-intermediate risk IPI, high-intermediate risk IPI, and high risk IPI, respectively; Figure 3). Among patients classified as IPI low risk, those with NCCN-IPI low-intermediate risk had shorter OS than those classified as low risk by both scoring systems. Of 4 patients with IPI low risk but NCCN-IPI high-intermediate risk, 3 patients died prior to year 3, and 1 patient was still alive at 3.1 years. Among patients classified as IPI low-intermediate risk, those with NCCN-IPI high-intermediate risk had shorter OS than did those with low-intermediate risk by both scoring systems. Of the 3 patients with IPI low-intermediate risk but NCCN-IPI low risk, all were alive at last follow-up. Among patients classified as IPI high-intermediate risk, those with NCCN-IPI low-intermediate risk had longer OS and those with NCCN-IPI high risk had shorter OS compared with patients with high-intermediate risk by both scoring systems. Finally, among patients classified as IPI high risk, those with NCCN-IPI high-intermediate risk had longer OS than those classified as high risk by both scoring systems. The 3 patients with IPI high risk but NCCN-IPI low-intermediate risk were alive at last follow-up. In contrast, when comparing OS across IPI risk groups at each level of the NCCN-IPI, the differences among IPI risk groups were not as distinct (P = .69, P = .09, P = .04, and P = .67 for low risk NCCN-IPI, low-intermediate risk NCCN-IPI, high-intermediate risk NCCN-IPI, and high risk NCCN-IPI, respectively; supplemental Figure 1A-D). Collectively, these data supported NCCN-IPI as the best performing risk score to use among the 3 clinical scores evaluated.

Figure 3.

OS for NCCN-IPI risk groups at each level of an IPI risk category. (A) Patients classifed with low risk IPI. (B) Patients classified with low-intermediate risk IPI. (C) Patients classified with high-intermediate risk IPI. (D) Patients classified with high risk IPI.

Figure 3.

OS for NCCN-IPI risk groups at each level of an IPI risk category. (A) Patients classifed with low risk IPI. (B) Patients classified with low-intermediate risk IPI. (C) Patients classified with high-intermediate risk IPI. (D) Patients classified with high risk IPI.

PFS

A total of 708 PFS events was observed among 2105 patients with PFS recorded, and the 5-year PFS estimate was 65% (95% CI, 63-67). PFS by IPI, R-IPI, and NCCN-IPI risk categories are provided (Figure 4). Results were similar for PFS as for OS, in that the NCCN-IPI risk category best fit the observed data and best discriminated outcome compared with IPI or R-IPI. In PFS models stratified by study and type of induction therapy, the one using the NCCN-IPI had the lowest AIC score and the highest C-index (supplemental Table 3). The same conclusion was reached when using PFS24 as an end point (supplemental Table 4).

Figure 4.

PFS for risk groups defined by 3 clinical scoring systems. (A) IPI, (B) R-IPI, and (C) NCCN-IPI.

Figure 4.

PFS for risk groups defined by 3 clinical scoring systems. (A) IPI, (B) R-IPI, and (C) NCCN-IPI.

Discussion

We have confirmed that the original IPI developed in 1993 identifies 4 clear prognostic groups of patients with DLBCL in the era of rituximab therapy also and performs better than the R-IPI.3  We have further shown that the NCCN-IPI exhibits the best prognostic qualities; it identifies a subgroup of patients with excellent long-term OS, similar to the R-IPI, but it also distinctly identifies low-intermediate, high-intermediate, and poor-risk groups, similar to the IPI. The NCCN-IPI also best discriminated clinical outcome when using PFS as the end point, a surrogate end point for OS, as well as when using PFS24, a good proximal indicator for the poor-risk population.8  However, in the rituximab era, all 3 scoring systems fail to identify a very poor–risk group with long-term OS clearly <50%. This latter finding is in contrast with the original publication of the NCCN-IPI, in which patients in the high-risk category were reported to have a 5-year OS of 33%.5 

All 3 risk scores are calculated using easily obtained clinical features that are part of standard diagnostic procedures. Even so, 4 trials could not be included in this study because documentation of extranodal involvement of major organ sites was lacking, possibly because the trials were conducted prior to the development of the NCCN-IPI, which highlighted the potential importance of this information. Patients were excluded because of the inability to calculate 1 of the 3 risk scores, primarily as a result of missing values on the ULN for LDH, which are used in all 3 scoring systems, and missing data on 1 of 5 major organ sites, which are used to determine the presence of major organ involvement in the NCCN-IPI. However, there is no real difference in the difficulty involved in acquiring the information needed to calculate the scores, because all 3 scores essentially require measurement of the same features (ie, age, LDH, data on extranodal involvement, Ann Arbor stage, and ECOG performance status). The only difference in complexity is attributable to the actual calculation of the scores. Although the IPI/R-IPI weights negative risk factors equally by assigning 1 point to each, the NCCN-IPI assigns more points (1, 2, or 3) to more negative and influential risk factors; all take a simple sum across the scores for a total risk score. In an age of electronic records, which can allow for derived calculations based on the input of simple parameters, this additional complexity is moot. A more relevant disadvantage of the NCCN-IPI is that it has not been in use as long as the IPI, and it may not be as accessible for analyses that require pooling and comparison of data across studies. Furthermore, if future trials only incorporate the NCCN-IPI, comparisons with older trials that select or stratify patients using the IPI may not be possible. Therefore, we would recommend collecting information in research trials such that the NCCN-IPI and IPI scores can be calculated. Outside of a clinical trial, differences between the NCCN-IPI and the IPI may not be sufficient to result in better treatment decisions. Therefore, the continued use of the well-established IPI seems acceptable until better scoring systems become available.

A shared, but clinically important, weakness of the widely used clinical scoring systems, including the NCCN-IPI, is the inability to identify a subgroup of patients with very poor survival. A risk model recently presented clinical features of the IPI combined with genomic features (KMT2D, PIM1, and MEF2B).23  This model identified patients with a worse prognosis than the highest-risk IPI category, but it has not been externally validated in a large group of patients. Another recent risk model incorporated clinical features and a measure for red cell distribution width, obtained from a complete blood count and associated with aging and active inflammatory processes, to identify a subgroup of patients with 5-year OS estimates ≤20%.24  This risk score is more feasible than others developed, because it also uses features that can be easily obtained at diagnosis across academic and community centers; however, it has yet to be externally validated. Another point to consider is that all 3 clinical scoring systems were developed prior to the reclassification of patients with high-risk DLBCL and MYC and BCL2 and/or BCL6 rearrangements as having high-grade B-cell lymphoma. It is unclear how the ability to identify poorer performing patients might be impacted if these patients were not included as DLBCL. The current study could not properly address this question, because information regarding which of these patients carries a poor prognosis (ie, those with double-hit disease in which MYC is translocated to an immunoglobulin partner) has been specified only recently.25 

Continued efforts are needed to develop prognostic scoring systems to better risk-stratify patients and select those most in need of novel therapies. Integrating molecular and other features of the tumor and its microenvironment into existing clinical scoring systems is 1 approach. Ideally, a molecular IPI should take into account prognostic characteristics of the patient and the tumor, as well as possess features allowing for modification, depending on which cellular pathways are being addressed by the respective investigational drug(s). Until such a tool has been developed, validated, and accepted for common use, we recommend using the NCCN-IPI to best distinguish patients with DLBCL who have very favorable long-term OS with an R-CHOP–based therapy from those who have less favorable long-term OS. For patients treated in clinical trials, the simultaneous calculation of NCCN-IPI and IPI may be optimal to allow comparison of outcomes with data acquired from previous trials in identical risk groups. For all other patients, continued use of the IPI seems acceptable, because differences between the NCCN-IPI and the IPI will not result in differing treatment decisions for the vast majority of patients.

The online version of this article contains a data supplement.

The publication costs of this article were defrayed in part by page charge payment. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.

Acknowledgments

The authors thank the patients, families, and caregivers who participated in each of these studies. They also acknowledge the SEAL group coinvestigators, all other investigators, and the study groups that contributed data for this analysis.

This work was supported by grants from Celgene.

Authorship

Contribution: A.S.R. analyzed and interpreted the data and wrote the manuscript; J.G.D. collected and analyzed data, reviewed drafts of the manuscript, and approved the final version; G.S. conceived and designed the study, review the manuscript, and approved the final version; A.W. analyzed data, reviewed drafts of the manuscript, and approved the final version; D.C., V.P., C.H., H.T., and H.G. collected data, reviewed drafts of the manuscript, and approved the final version; J.F. reviewed drafts of the manuscript and approved the final version; M.Z. collected data, interpreted results, reviewed drafts of the manuscript, and approved the final version; C.F. and Q.S. conceived and designed the study, interpreted data, reviewed the manuscript, and gave final approval of the study; and N.S. conceived and designed the study, collected data and interpreted results, edited and reviewed the manuscript, and provided final approval of the study.

Conflict-of-interest disclosure: G.S. has provided consulting services to Roche/Genentech, Gilead Sciences, Janssen, Celgene, Novartis, Merck, Pfizer, Acerta Pharma, Kite Pharma, Servier, Morphosis, and Epizyme and has received honoraria from Roche/Genentech, Amgen, Janssen, Celgene, Servier, Gilead Sciences, Novartis, AbbVie, Merck, Takeda, and Morphosis. D.C. has received research funding (institutional) from AstraZeneca, Amgen, Sanofi, Merrimack, Celgene, MedImmune, Bayer, 4SC, Clovis Oncology, Lilly, Janssen, and Merck. V.P. has served on the speaker’s bureau for Hexalin Pharmaceuticals and has had travel, accommodations, or other expenses paid by Roche, AbbVie, and Amgen. C.H. has provided consulting services to Roche, Celgene, Janssen-Cilag, Gilead Sciences, and Takeda; has received honoraria from Novartis, Amgen, Servier/Pfizer, and Gilead Sciences; and has had travel, accommodations, or other expenses paid by Roche, Celgene, and Amgen. H.T. has provided consulting services to Karyopharm Therapeutics, Roche, and Janssen; has received honoraria from Bristol-Myers Squibb and Servier; and has had travel, accommodations, or other expenses paid by Roche. H.G. has provided consulting services to Celgene and Gilead Sciences and has had travel, accommodations, or other expenses paid by Roche, Amgen, and Gilead Sciences. J.F. is employed by Celgene. C.F. has provided consulting services to OptumRX, Seattle Genetics, Bayer, Gilead Sciences, Spectrum Pharmaceuticals, AbbVie, Celgene, Denovo Biopharma, BeiGene, AstraZeneca, Karyopharm Therapeutics, and Pharmacyclics/Janssen; has received research funding (institutional) from Acerta Pharma, Infinity Pharmaceuticals, Onyx, Janssen Oncology, Gilead Sciences, Celgene, TG Therapeutics, Genentech/Roche, Pharmacyclics, AbbVie, Immune Design, and BeiGene; and has had travel, accommodations, or other expenses paid by Celgene, Genentech/Roche, and Gilead Sciences. N.S. owns stock and other ownership interests in Celgene; has provided consulting services to Riemser; has received honoraria from Takeda, Gilead Sciences, Riemser, and Janssen China R&D; and has had travel, accommodations, or other expenses paid by Takeda, Gilead Sciences, Riemser, and Janssen China R&D. The remaining authors declare no competing financial interests.

Correspondence: Amy S. Ruppert, M200 Starling Loving, 410 West 10th Ave, The Ohio State University, Columbus, OH 43210, e-mail: amy.stark@osumc.edu.

REFERENCES

REFERENCES
1.
International Non-Hodgkin’s Lymphoma Prognostic Factors Project
.
A predictive model for aggressive non-Hodgkin’s lymphoma
.
N Engl J Med
.
1993
;
329
(
14
):
987
-
994
.
2.
Sehn
LH
,
Berry
B
,
Chhanabhai
M
, et al
.
The revised International Prognostic Index (R-IPI) is a better predictor of outcome than the standard IPI for patients with diffuse large B-cell lymphoma treated with R-CHOP
.
Blood
.
2007
;
109
(
5
):
1857
-
1861
.
3.
Ziepert
M
,
Hasenclever
D
,
Kuhnt
E
, et al
.
Standard International prognostic index remains a valid predictor of outcome for patients with aggressive CD20+ B-cell lymphoma in the rituximab era
.
J Clin Oncol
.
2010
;
28
(
14
):
2373
-
2380
.
4.
Tay
K
,
Tai
D
,
Tao
M
,
Quek
R
,
Ha
TC
,
Lim
ST
.
Relevance of the International Prognostic Index in the rituximab era
.
J Clin Oncol
.
2011
;
29
(
1
):
e14
-
author reply e15
.
5.
Zhou
Z
,
Sehn
LH
,
Rademaker
AW
, et al
.
An enhanced International Prognostic Index (NCCN-IPI) for patients with diffuse large B-cell lymphoma treated in the rituximab era
.
Blood
.
2014
;
123
(
6
):
837
-
842
.
6.
Melchardt
T
,
Troppan
K
,
Weiss
L
, et al
.
A modified scoring of the NCCN-IPI is more accurate in the elderly and is improved by albumin and B2-microglobulin
.
Br J Haematol
.
2015
;
168
(
2
):
239
-
245
.
7.
Wei
Y
,
Zhang
Y
,
Hao
X
, et al
.
NCCN-IPI is superior to aaIPI and IPI in predicting survival of DLBCL in the rituximab era
.
Blood
.
2016
;
128
(
22
):
1872
.
8.
Shi
Q
,
Schmitz
N
,
Ou
FS
, et al
.
Progression-free survival as a surrogate end point for overall survival in first-line diffuse large B-cell lymphoma: an individual patient-level analysis of multiple randomized trials (SEAL)
.
J Clin Oncol
.
2018
;
36
(
25
):
2593
-
2602
.
9.
Récher
C
,
Coiffier
B
,
Haioun
C
, et al;
Groupe d’Etude des Lymphomes de l’Adulte
.
Intensified chemotherapy with ACVBP plus rituximab versus standard CHOP plus rituximab for the treatment of diffuse large B-cell lymphoma (LNH03-2B): an open-label randomised phase 3 trial
.
Lancet
.
2011
;
378
(
9806
):
1858
-
1867
.
10.
Delarue
R
,
Tilly
H
,
Mounier
N
, et al
.
Dose-dense rituximab-CHOP compared with standard rituximab-CHOP in elderly patients with diffuse large B-cell lymphoma (the LNH03-6B study): a randomised phase 3 trial
.
Lancet Oncol
.
2013
;
14
(
6
):
525
-
533
.
11.
Coiffier
B
,
Lepage
E
,
Briere
J
, et al
.
CHOP chemotherapy plus rituximab compared with CHOP alone in elderly patients with diffuse large-B-cell lymphoma
.
N Engl J Med
.
2002
;
346
(
4
):
235
-
242
.
12.
Schmitz
N
,
Nickelsen
M
,
Ziepert
M
, et al;
German High-Grade Lymphoma Study Group (DSHNHL)
.
Conventional chemotherapy (CHOEP-14) with rituximab or high-dose chemotherapy (MegaCHOEP) with rituximab for young, high-risk patients with aggressive B-cell lymphoma: an open-label, randomised, phase 3 trial (DSHNHL 2002-1)
.
Lancet Oncol
.
2012
;
13
(
12
):
1250
-
1259
.
13.
Pfreundschuh
M
,
Trümper
L
,
Osterborg
A
, et al;
MabThera International Trial Group
.
CHOP-like chemotherapy plus rituximab versus CHOP-like chemotherapy alone in young patients with good-prognosis diffuse large-B-cell lymphoma: a randomised controlled trial by the MabThera International Trial (MInT) Group
.
Lancet Oncol
.
2006
;
7
(
5
):
379
-
391
.
14.
Pfreundschuh
M
,
Schubert
J
,
Ziepert
M
, et al;
German High-Grade Non-Hodgkin Lymphoma Study Group (DSHNHL)
.
Six versus eight cycles of bi-weekly CHOP-14 with or without rituximab in elderly patients with aggressive CD20+ B-cell lymphomas: a randomised controlled trial (RICOVER-60)
.
Lancet Oncol
.
2008
;
9
(
2
):
105
-
116
.
15.
Cunningham
D
,
Hawkes
EA
,
Jack
A
, et al
.
Rituximab plus cyclophosphamide, doxorubicin, vincristine, and prednisolone in patients with newly diagnosed diffuse large B-cell non-Hodgkin lymphoma: a phase 3 comparison of dose intensification with 14-day versus 21-day cycles
.
Lancet
.
2013
;
381
(
9880
):
1817
-
1826
.
16.
Kaplan
EL
,
Meier
P
.
Nonparametric estimation from incomplete observations
.
J Am Stat Assoc
.
1958
;
53
(
282
):
457
-
481
.
17.
Cox
DR
.
Regression models and life-tables
.
J Royal Stat Soc. Series B (Methodological)
.
1972
;
34
(
2
):
187
-
220
.
18.
Akaike
H
.
A new look at the statistical model identification
.
IEEE Trans Automat Contr
.
1974
;
19
(
6
):
716
-
723
.
19.
Burnham
KP
,
Anderson
DR
.
Multimodel inference: understanding AIC and BIC in model selection
.
Sociol Methods Res
.
2004
;
33
(
2
):
261
-
304
.
20.
Harrell
FJ
.
Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis
.
New York, NY
:
Springer Verlag New York
;
2001
.
21.
Pencina
MJ
,
D’Agostino
RB
Sr.
,
D’Agostino
RB
Jr.
,
Vasan
RS
.
Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond
.
Stat Med
.
2008
;
27
(
2
):
157
-
172, NaN-212
.
22.
Burnham
KP
,
Anderson
DR
.
Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach
.
New York, NY
:
Springer Verlag New York
;
2002
.
23.
Song
JY
,
Perry
AM
,
Herrera
AF
, et al
.
New genomic model integrating clinical factors and gene mutations to predict overall survival in patients with diffuse large B-cell lymphoma treated with R-CHOP [abstract]
.
Blood
.
2018
;
132
(
suppl 1
). Abstract
346
.
24.
Bento
L
,
Diaz-Lopez
A
,
Barranco
G
, et al
.
New prognosis score including absolute lymphocytes/monocytes ratio and beta2microglobulin in patients with diffuse large B cell lymphoma (DLBCL) treated with R-CHOP: Spanish Lymphoma Group Experience (GELTAMO) [abstract]
.
Blood
.
2018
;
132
(
suppl 1
). Abstract
347
.
25.
Rosenwald
A
,
Bens
S
,
Advani
R
, et al
.
Prognostic significance of MYC rearrangement and translocation partner in diffuse large b-cell lymphoma: a study by the Lunenburg Lymphoma Biomarker Consortium
.
J Clin Oncol
.
2019
;
37
(
35
):
3359
-
3368
.

Supplemental data