Comment on Milligan et al, page 4614

In this issue of Blood, Milligan and colleagues report that survival was shorter (P = .05) in high-risk patients randomized to receive, generally as salvage therapy, the widely used combination of fludarabine and cytarabine (FLA) rather than standard cytosine arabinoside, daunorubicin, and etoposide (ADE). The relative effects of FLA and ADE were the same when used alone or when combined, in separate randomizations, with granulocyte colony-stimulating factor or all-trans retinoic acid.

Fludarabine and cytarabine (FLA) or FLA with granulocyte colony-stimulating factor (FLAG) may prove more effective in patients who have better prognoses. Nonetheless, a fundamental question is how enthusiasm for FLA/FLAG in high-risk patients has been sustained when Milligan and colleagues suggest that the enthusiasm was unjustified. Alternatively, why was the published literature on FLA, in retrospect, misleading? To understand how we got here from there, it may be useful to note that none of the studies of FLA or FLAG in acute myeloid leukemia (AML)/high-risk myelodysplastic syndrome (MDS) cited by Milligan et al made reference to a control group (ie, patients given other therapies). Although typical of published phase 2 studies and consistent with the accepted view that such studies are intended to determine efficacy with efforts at comparison reserved for phase 3, the lack of a control group seems inconsistent with accepted scientific practice. Together with the natural, laudatory desire of investigators to report “positive” results, the absence of controls, and the seeming failure to consider the prior probability of positive results, may result in published results that are falsely positive.1  The lack of controls in phase 2 trials is also medically problematic. Specifically, patients are not as interested in knowing whether a given therapy is active as they are in knowing, as soon as possible, whether it is better than another, a question that implies a control group.

One approach to this issue are randomized “selection” designs intended to select for possible future investigation, the best among several therapies (which could include a standard) regardless of its degree of superiority.2  This goal requires fewer patients (eg, 20 per treatment arm) than standard designs, so more treatments can be studied.2  Using a selection design, the probability of correctly selecting a truly superior agent among four is typically 60%. Although seemingly underpowered compared with standard phase 3 designs, the 80% to 90% power associated with the latter is in fact nominal. This is because these designs ignore the often-informal process leading to selection of typically a single new therapy, which is then compared to a standard. History suggests that without data from a trial, it is often impossible to know which of several new therapies should be selected.2  Consequently, selection of one therapy and rejection of 3 others may itself be associated with a false negative rate as high as 75% even before the phase 3 trial begins. Simply put, the worst false negative may result from an arbitrary decision not to study a treatment at all, a possibility reduced by use of selection designs. Finally, in order to further speed evaluation of the relative efficacy of new agents, phase 2 to 3 designs have been proposed and shown to be more efficient than distinct phase 2 and phase 3 studies.3 

1
Ioannidis JPA. Why most published research findings are false.
PLoS Med
2005
;
2
:
e124
(reprinted in Chance. 2005;18:40-47).
2
Estey E, Thall PF. New designs for phase 2 clinical trials.
Blood.
2003
;
102
:
442
-448.
3
Inoue LY, Thall PF, Berry DA. Seamlessly expanding a randomized phase II trial to phase III.
Biometrics.
2002
;
58
:
823
-831.