Abstract
The role of thalidomide for previously untreated elderly patients with multiple myeloma remains unclear. Six randomized controlled trials, launched in or after 2000, compared melphalan and prednisone alone (MP) and with thalidomide (MPT). The effect on overall survival (OS) varied across trials. We carried out a meta-analysis of the 1685 individual patients in these trials. The primary endpoint was OS, and progression-free survival (PFS) and 1-year response rates were secondary endpoints. There was a highly significant benefit to OS from adding thalidomide to MP (hazard ratio = 0.83; 95% confidence interval 0.73-0.94, P = .004), representing increased median OS time of 6.6 months, from 32.7 months (MP) to 39.3 months (MPT). The thalidomide regimen was also associated with superior PFS (hazard ratio = 0.68, 95% confidence interval 0.61-0.76, P < .0001) and better 1-year response rates (partial response or better was 59% on MPT and 37% on MP). Although the trials differed in terms of patient baseline characteristics and thalidomide regimens, there was no evidence that treatment affected OS differently according to levels of the prognostic factors. We conclude that thalidomide added to MP improves OS and PFS in previously untreated elderly patients with multiple myeloma, extending the median survival time by on average 20%.
Introduction
In recent years, there has been considerable progress in the treatment of multiple myeloma. Thalidomide was demonstrated to be effective in patients with relapsing multiple myeloma in 1999.1 This discovery stimulated a number of studies, which confirmed the initial results that approximately 30% of patients with relapsing multiple myeloma respond to thalidomide.2 A further step was to test the effect of thalidomide in newly diagnosed multiple myeloma; and from the year 2000 onwards, 5 groups in Europe launched 6 randomized studies with similar design, aimed at patients who had not received previous treatment and were not eligible for high-dose therapy. An Italian randomized study with melphalan, prednisone, and thalidomide (MPT) versus melphalan and prednisone (MP) demonstrated a significant difference in progression-free survival (PFS),3 but this did not translate into an overall survival (OS) benefit.4 One French study demonstrated a significant difference in both PFS and OS in patients 65 to 75 years of age,5 and the same result was found in a second, parallel study of patients older than 75 years.6 The Dutch/Belgian study has demonstrated differences in event-free survival, OS, and PFS.7 A Nordic trial found small and nonsignificant differences in OS and PFS, although there was a significant difference in response during the first year.8 Finally, a Turkish trial has been completed, demonstrating a significant response and an early but nonsignificant survival advantage after 6 months of MPT treatment.9 Thus, studies have shown a favorable effect on response rates, but it is less clear whether this results in improved OS as only 4 trials demonstrated a significant advantage of MPT over MP. Similarly, a meta-analysis using only published summary data suggested a “trend toward improvement in OS.”10
To provide a definitive estimate of the survival benefits of MPT over MP, if any, we performed an individual patient meta-analysis based on data from all 6 completed trials. This analysis, based on individual data from 1685 patients, also allows us to explore possible reasons for the heterogeneity in the reported results of the separate studies, including the impact of interactions of patient characteristics with treatment.
Methods
Objectives
The primary aim of this meta-analysis is to examine the effect of thalidomide on patients with newly diagnosed multiple myeloma, using the individual patient data from the 6 recently completed trials that compared MPT versus MP. OS, defined as the time from randomization until death from any cause, was the primary endpoint, with the hypothesis that thalidomide improves OS. PFS and best response at 12 months were defined as secondary outcomes.
Secondary aims were prespecified: To explore with interaction tests the hypotheses that thalidomide benefits only patients with good performance status and that thalidomide has the same effect irrespective of renal failure defined by elevated serum creatinine levels.
Data extraction
Individual patient data were obtained for the 6 trials of the 5 collaborating groups. Best response was defined as being the best recorded level of response within the first 12 months, and patients were classified into: dead within 12 months or, for those surviving one year or longer, very good partial response or better (VGPR), partial response (PR), and no response reported. The definition of complete response varied, and data were not available for all studies, so complete response was lumped with VGPR. All trials received institutional review board approval from their respective institutions.
Based on the literature,11,12 the following factors, which were available across studies, were also collected for each patient to investigate study heterogeneity and for consideration as important risk factors that might account for study-to-study variation in outcomes: International Staging System (ISS),11 Durie-Salmon Staging (DS stage),12 Eastern Cooperative Oncology Group/World Health Organization performance status (WHO-PS),13 serum β2-microglobulin, serum creatinine, serum calcium, serum lactate dehydrogenase (LDH), and hemoglobin. Age and sex were also recorded.
Statistical analysis
All randomly assigned patients were included on an intention-to-treat basis. Living patients were censored at the date they were last confirmed to be alive. All analyses were stratified by study. “Forest plots” of estimated hazard ratio (HR) and 95% confidence intervals (CIs) were used for the basic analysis. Treatment effect was estimated from survival curves14 with HR estimated from proportional odds Cox models.15 Survival curves are simple (nonstratified) Kaplan-Meier curves.14
It was anticipated that survival might be affected by baseline heterogeneity as the studies used differing patient eligibility and selection criteria. Heterogeneity was assessed and tested by I2,16,17 which measures inconsistency (percentage of total variation across studies resulting from heterogeneity) of effects. It was prespecified that, in the event of heterogeneity, a random effect model17,18 would be used, as described in the Cochrane Handbook for Systematic Reviews of Interventions,19 although this approach remains controversial.20 The risk factors were used to stratify HR forest plots of the MPT versus MP, with the continuous factors being categorized at usual thresholds (β2-microglobulin < 3.5 mg/L and > 5.5 mg/L, LDH < 300 U/L, creatinine < 176μM, albumin < 35 g/L, and hemoglobin < 8.5 g/dL or ≥ 10.5 g/dL).11,12 The importance of risk factors on survival was also explored using Cox models, and their impact on treatment effect was examined using interaction tests. For the Cox models, continuous factors were explored as continuous; for β2-microglobulin and creatinine, a logarithmic transformation reflecting a log-linear or percentage-change relationship was found to be appropriate. It was anticipated that the residual variation and study-to-study heterogeneity might be reduced by allowing for important prognostic factors.
All statistical tests were 2-sided, and P values < .05 were considered to be statistically significant for the main analyses. When exploring covariates and interactions, a more conservative P < .01 was used as an informal protection against including an excess of false-positive covariates and treatment interactions on outcome through multiplicity of significance tests. Analyses were carried out in STATA, Version 11,21 with the stratified random effects models implemented using a shared frailty extension to the Cox model.22 The proportional hazards assumption was confirmed by visual inspection of survival plots and complementary log-log plots for individual trials. The analyses were effected by analyzing the individual patient data in a one-stage analysis (Cox regressions) and also using 2-stage analyses in which summary HRs were first estimated for each subgroup and then in the second stage combined with random effects analyses of forest plots using the STATA “metan” macro.23
Results
Description of trials
The 6 trials are shown in Table 1. A total of 1685 patients were recruited to the combined trials and randomized between MP and MPT. The trials vary with respect to treatment policy (dose and the number of cycles/duration of thalidomide, and similarly for MP), and study design (only Intergroupe Francophone du Myélome-II [IFM-II] and Nordic Myeloma Study Group [NMSG] were blinded, with placebo added to MP in the control arm). Trials also varied in the selection of patients (WHO-PS, stage of disease, and age range). The characteristics of the recruited patients (Table 2) reflect the impact of these selection criteria on stage and performance status: for example, 31% in IFM-1 were ISS stage I, but only 16% in NMSG; 52% in Italian Multiple Myeloma Network (GIMEMA) had WHO-PS of 0, compared with 7% in Turkish Myeloma Study Group (TMSG). These are well recognized prognostic factors for survival, and the impact of this baseline heterogeneity is explored in the survival analyses.
OS
A total of 920 deaths (55%) have been observed in the 1685 patients. Information about date of death or censoring was available for all patients, and the OS curves are shown in Figure 1. Median survival time on MP was 32.7 months (95% CI, 30.5-36.6 months), and on MPT 39.3 months (95% CI, 35.6-44.6 months). The absolute increase in 2-year survival was 5.1%, with 63.7% on MP alive at 2 years (95% CI, 60.3%-66.9%) and 68.8% on MPT (95% CI, 65.4%-71.9%); larger absolute differences were observed at 3 and 4 years. Using an unadjusted Cox model, stratified by study, the estimated HR is 0.83 (95% CI, 0.73-0.94, P = .004), representing a statistically significant benefit to MPT. However, Figure 1 also suggests that there may be early deaths in the MPT arm, with no benefit OS benefit in the first year.
Figure 2 summarizes the HRs by study and indicates significant trial-to-trial heterogeneity (I2 = 61.3%, P = .024). This heterogeneity may jeopardize the P value and CI of the estimated overall effect (0.82). When the prognostic factors were explored individually as potential covariates using Cox models, ISS, DS stage, WHO-PS, β2-microglobulin, hemoglobin, albumin, creatinine, LDH, and age were all important risk factors for survival; however, not all factors were independent, and a multivariate Cox model resulted in retaining ISS, DS stage I or II versus III, WHO-PS, age, creatinine, and albumin (all P < .007). These baseline covariates appeared to explain the heterogeneity that was observed in the unadjusted model. The covariate-adjusted model confirmed the estimated treatment effect (HR = 0.79, P = .002, 95% CI, 0.68-0.92, P = .002). Additional Cox models were used to explore treatment-covariate interactions, but no clear evidence of interactions was found (all P > 0.036). Thus, for example, ISS stage is one of the most important prognostic factors. Figure 3 shows HRs for treatment effect in each study, divided by ISS; the values of HR are similar for each stage, and no interaction effect is apparent. Figure 4 summarizes the effects of each prognostic factor on OS.
PFS
Figures 5 and 6 show corresponding results for PFS. The estimated HR was 0.68 in favor of MPT, with 95% CI 0.61 to 0.76 (Figure 5; P < .0001). Median PFS time for MP was 14.9 months (95% CI, 14.0-16.6 months) and for MPT 20.3 months (95% CI, 18.8-21.5 months). Two-year PFS was 28.4% (MP) and 42.5% (MPT). As before, there was evidence of study-to-study heterogeneity (Figure 6), and this could be accounted for by a combination of heterogeneity in the baseline characteristics (especially ISS). After allowing for this heterogeneity, the overall HR was 0.68 with 95% CI 0.56 to 0.81 (P < .0001), confirming the Cox model. The interaction tests again showed no evidence that baseline characteristics might interact with treatment effects, apart from a significant treatment interaction (P = .006) according to creatinine level > or < 176μM.
Survival from date of reported disease progression was also analyzed; the HR between treatment arms was 1.02, suggesting no shortening of survival after relapse.
Subgroup analyses
Two prespecified subgroup analyses were explored. The first hypothesis was that MPT only benefits patients with good WHO-PS of 0 to 2. As already noted, the test for interaction between OS and WHO-PS was nonsignificant, and there was no evidence of variation of treatment effect across categories. However, the sample size for poor PS (3 or 4) was small, and only 2 trials exceeded 20 patients: NMSG, 107 patients; IFM-1, 23 patients. The HR for these patients was 1.01 (95% CI, 0.71-1.43), which, although not statistically significant, remains consistent with the prior hypothesis of no survival benefit from MPT if PS is poor.
The second hypothesis was that creatinine levels of 176μM or greater at baseline would not affect any MP/MPT difference. Again, there was no significant interaction detectable for OS, and the forest plot showed that, for creatinine < 176, the HR = 0.79 (95% CI, 0.68-0.91, P = .001); and for creatinine ≥ 176, HR = 0.96 (95% CI, 0.73-1.46, P = .8). However, for PFS, there was a significant interaction with creatinine as noted earlier: < 176, HR = 0.64 (95% CI, 0.55-0.74, P < .0001), but at ≥ 176, there was no evidence of PFS advantage with HR = 0.87 (95% CI, 0.54-1.39, P = .6). Thus, the benefit of MPT in terms of PFS could be questioned for patients with baseline creatinine above this threshold.
Response in first year
The best response reported in the first 12 months is shown as a percentage for VGPR and PR (Table 3). In all studies, there was a higher percentage of VGPR in the MPT group compared with MP (not shown), and overall VGPR was reported for 25% of MPT patients versus 9% MP; PR or better was 59% MPT and 37% MP (overall χ2 test, P < .0001).
Discussion
Following this meta-analysis, the effect of thalidomide is now well established based on 6 studies. After due allowance was made for baseline characteristics, there was clear evidence of a benefit to patients receiving thalidomide, both in terms of OS and PFS. These results were also supported by the response to therapy that was observed in the first 12 months. The estimated HR for treatment effect on OS was 0.81, and highly significant. This represents an extra 6.6 months in median survival time, corresponding to an approximate 20% increase, which is a substantial and clinically important increase in survival. The variation in basic characteristics of the studies, as discussed later in “Discussion,” may have contributed to the substantial variation in results. This reflects how thalidomide is used in real life and should be taken into consideration.
This meta-analysis highlights the value of having individual patient data, which enables a full exploration of the causes of heterogeneity. This in turn enables more precise estimation of HRs with narrower CIs for the estimates and more convincing P values. An analysis only using published summary results from the trials,10 or “aggregate data,” cannot explore in full the impact of adding thalidomide to MP. We observed significant heterogeneity among the trials, but this could be largely accounted for as a consequence of major differences in baseline characteristics. Levels of the prognostic factors differed substantially from trial to trial; and in particular, both stage of disease and performance status showed major variations. Thus, ISS sufficed to account for much of the study-to-study heterogeneity. However, improvement in survival (OS and PFS) for the MPT arm appeared to be unaffected by level of ISS. LDH, which was not retained as an important prognostic factor in the multivariate model, varied substantially; in part, this variation in LDH (and other parameters) might be the result of variation across the many laboratories in the 6 countries.
We failed to find indication that poor performance status (WHO-PS > 2) contraindicates the use of MPT. However, the observed HRs were consistent with there being no advantage to MPT in these patients, although the sample size was limited and one trial dominated this subgroup analysis. Thus, despite the large combined sample of 1685 patients, the interaction test was underpowered, and the results regarding frail patients remain inconclusive. Similarly, indications against using MPT in patients with high creatinine levels were weak for OS, although the HRs remained consistent with there being less or no advantage. For PFS, however, there was a significant treatment interaction indicating less benefit in patients thought to have potential renal failure. Because patients presenting with renal failure do notably worse with many treatments, we recommend caution in using MPT for these patients.
An optimal dose of thalidomide with MP is not established. In the studies that introduced thalidomide in relapsing myeloma, the intended dose was 800 mg, which later was shown to be too high.1 The 6 trials used a variety of dosage levels and cycles, both for MP and MPT. The French and Turkish trials used MP and thalidomide for 12 months, whereas the other trials continued thalidomide until relapse. However, none of the studies was designed to detect dose-effect relationships, and all used simple MP/MPT randomizations. In practice, clinicians would vary the actual dose according to patient's status, evidence of response or relapse, and occurrence of side effects or toxicity. A substantial proportion of patients in all studies either stopped thalidomide prematurely or had dosage reduced. Thus, it would be difficult to analyze the individual patient data for dose received and infeasible to eliminate bias from the consequent analysis. Therefore, we decided to rely on exploration of the forest plots (such as Figure 1) to determine whether studies with a policy of high dose differed from those with low dose. No such difference is obvious. Similarly, inspection of the forest plots revealed no obvious association of effect with use of double-blind placebo.
Other novel agents, such as bortezomib and lenalidomide, have also been introduced in combination with MP in first-line treatment.24-26 As with thalidomide, the combination of MP plus bortezomib increases remission rates, PFS, and OS.24 MP plus lenalidomide increases remission rates and PFS, but an increase in OS has not yet been demonstrated.25 However, the observation period is still too short to evaluate this fully. The MP combinations with thalidomide, bortezomib, or lenalidomide are all effective treatment options but have not been compared with each other, and a strict priority cannot be given.
The main limitation of this meta-analysis, as with any overview, is that the patient population as well as dose and duration of treatment varied across studies, resulting in heterogeneity as discussed. Sample size remained too small for reliable subgroup analyses. This paper focuses on the impact of MPT for OS, PFS, and response to therapy. Increased OS or PFS has to be matched against toxicity and side effects, which are being analyzed by another team within our collaborative group and will be reported separately. It has been suggested that MPT patients may have double the rate of grade 3 or 4 adverse events,27 which may also explain the early deaths and lack of first-year OS benefit.
In conclusion, this meta-analysis of 6 trials demonstrated that thalidomide added to MP improves survival in previously untreated elderly patients with multiple myeloma, extending the median OS time by 20%. There was no evidence of benefit (nor of lack of benefit) to patients with poor PS. However, we advise against thalidomide for patients with potential renal failure. More MP-MPT trials are unlikely to be conducted, and it is important to reach a definitive conclusion based on all available trial data. These analyses show that MP alone should no longer be the reference and that MP plus thalidomide is an effective first-line treatment of multiple myeloma.
The publication costs of this article were defrayed in part by page charge payment. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.
Authorship
Contribution: P.M.F. was responsible for the data analysis and wrote the manuscript; and all authors were the principal investigators and major contributors or statisticians for respective trials, participated in the elaboration of the meta-analysis statistical plan, and made important contribution to the preparation and revision of the manuscript.
Conflict-of-interest disclosure: A.P. has honoraria and an advisory role for Celgene. S.B. has honoraria from Celgene. C.H., P. Moreau, and T.F. have honoraria or an advisory role for Celgene, Pharmion, and Janssen Cilag. A.W. and P.G. have an advisory role for Janssen Cilag, Celgene, and Mundipharma. M.B. has honoraria from and is a member of the Speakers Bureau for Celgene and Ortho Biotech. The remaining authors declare no competing financial interests.
Correspondence: Peter Fayers, Institute of Applied Health Sciences, University of Aberdeen, Aberdeen AB25 2ZD, United Kingdom; e-mail: [email protected].