Intended for healthcare professionals

CCBYNC Open access
Research

Comparison of treatment effect sizes associated with surrogate and final patient relevant outcomes in randomised controlled trials: meta-epidemiological study

BMJ 2013; 346 doi: https://doi.org/10.1136/bmj.f457 (Published 29 January 2013) Cite this as: BMJ 2013;346:f457
  1. Oriana Ciani, PhD candidate1,
  2. Marc Buyse, chairman2,
  3. Ruth Garside, senior lecturer1,
  4. Toby Pavey, research fellow3,
  5. Ken Stein, professor1,
  6. Jonathan A C Sterne, professor4,
  7. Rod S Taylor, professor1
  1. 1PenTAG, Institute for Health Services Research, University of Exeter Medical School, University of Exeter, Exeter EX2 4SG, UK
  2. 2International Drug Development Institute, Louvain-la-Neuve, Belgium and Interuniversity Institute for Biostatistics and Statistical Bioinformatics, Hasselt University, Diepenbeek, Belgium
  3. 3School of Human Movement Studies, University of Queensland, Brisbane, QLD, Australia
  4. 4School of Social and Community Medicine, University of Bristol, Bristol, UK
  1. Correspondence to: O Ciani oriana.ciani{at}pcmd.ac.uk
  • Accepted 29 October 2012

Abstract

Objective To quantify and compare the treatment effect and risk of bias of trials reporting biomarkers or intermediate outcomes (surrogate outcomes) versus trials using final patient relevant primary outcomes.

Design Meta-epidemiological study.

Data sources All randomised clinical trials published in 2005 and 2006 in six high impact medical journals: Annals of Internal Medicine, BMJ, Journal of the American Medical Association, Lancet, New England Journal of Medicine, and PLoS Medicine.

Study selection Two independent reviewers selected trials.

Data extraction Trial characteristics, risk of bias, and outcomes were recorded according to a predefined form. Two reviewers independently checked data extraction. The ratio of odds ratios was used to quantify the degree of difference in treatment effects between the trials using surrogate outcomes and those using patient relevant outcomes, also adjusted for trial characteristics. A ratio of odds ratios >1.0 implies that trials with surrogate outcomes report larger intervention effects than trials with patient relevant outcomes.

Results 84 trials using surrogate outcomes and 101 using patient relevant outcomes were considered for analyses. Study characteristics of trials using surrogate outcomes and those using patient relevant outcomes were well balanced, except for median sample size (371 v 741) and single centre status (23% v 9%). Their risk of bias did not differ. Primary analysis showed trials reporting surrogate endpoints to have larger treatment effects (odds ratio 0.51, 95% confidence interval 0.42 to 0.60) than trials reporting patient relevant outcomes (0.76, 0.70 to 0.82), with an unadjusted ratio of odds ratios of 1.47 (1.07 to 2.01) and adjusted ratio of odds ratios of 1.46 (1.05 to 2.04). This result was consistent across sensitivity and secondary analyses.

Conclusions Trials reporting surrogate primary outcomes are more likely to report larger treatment effects than trials reporting final patient relevant primary outcomes. This finding was not explained by differences in the risk of bias or characteristics of the two groups of trials.

Introduction

Evidence for the effectiveness of treatments should ideally come from randomised clinical trials or systematic reviews of trials that assess final endpoints relevant to patients, such as survival or health related quality of life.1 2 However, aspects of the design and conduct of randomised clinical trials have been shown to lead to overestimation of treatment effect size. These include inappropriate random sequence generation,3 inadequate allocation concealment,4 5 lack of blinding,6 single centre status,7 8 and the use of composite outcomes.9

Surrogate outcomes are often used in clinical trials as substitutes for final patient relevant outcomes. Advantages of surrogate outcomes over final outcomes are that they may occur faster or may be easier to assess, thereby shortening the duration, size, and cost of trials.10 11 A key rationale for the use of surrogate outcomes in trials is not only substitution12 but the prediction of treatment benefit in the absence of data on patient relevant outcomes.13 14 15 Several drugs have been licensed on this basis—for example, statins (based on low density lipoprotein levels), AIDS drugs (based on HIV RNA or CD4 count levels), and cancer drugs (based on time to progression or disease-free survival).16

Despite the potential appeal of using surrogate outcomes, the use of such trials in policy making remains controversial.11 17 18 Gefitinib, an orally administered epidermal growth factor receptor tyrosine kinase inhibitor, was approved by the Food and Drugs Administration in the United States for marketing in May 2003 for patients with non-small cell lung cancer based on the surrogate outcome of tumour response rate. The initial approved indication was for the treatment of patients who were refractory to established cancer treatments (both a platinum drug and docetaxel).19 In 2005, however, data from two clinical studies became available showing no significant survival benefit; the FDA released a new labelling for gefitinib that prevented its use in new patients with non-small cell lung cancer, limiting its usage to only continuation in those patients with cancer who had already taken the medicine and whose doctor believed it was helping them.20 Although often cited by sceptics, this and other such potentially complete failures of surrogate outcomes21 22 23 remain relatively rare.

Given the growing pressure for faster access to innovative treatments for patients, reimbursement decisions for new treatments are now often made at or around the time of licensing, increasing pressure to rely on treatment effects from trials reporting surrogate primary outcomes.24 25 Such reimbursement decisions often depend on the use of economic modelling to extrapolate treatment effects based on surrogate outcomes into an estimate of cost effectiveness, such as the incremental cost per quality adjusted life year (QALY), the metric currently recommended by the National Institute for Health and Clinical Excellence.13 26

We quantified and compared the treatment effects in a sample of randomised clinical trials reporting either a surrogate or final patient relevant primary outcome. We also compared the risk of bias in these two groups of trials.

Methods

We searched Medline through PubMed for randomised clinical trials published in 2005 and 2006 in six high impact (impact factor >14 in 2011 according to ISI Web of Knowledge) medical journals—that is, Annals of Internal Medicine, BMJ, Journal of the American Medical Association, New England Journal of Medicine, Lancet, and PLoS Medicine (see supplementary table 1 for details of search strategy). We purposively chose general (rather than specialist) medical journals as we sought to compare surrogate and final patient relevant outcomes across a broad range of medical conditions. Otherwise our two year sampling frame was based on a recent study that examined the reporting of surrogate outcomes in trials.27 The study authors provided us with their listing of trials and we checked that this captured all randomised clinical trials from our database search.

Study selection

Two authors (OC, RST) independently undertook the inclusion and exclusion of trials (see box).

Inclusion criteria

  • Randomised clinical trial

  • Publication years 2005-06

  • Journals (Annals of Internal Medicine, BMJ, Journal of the American Medical Association, New England Journal of Medicine, Lancet, PLoS Medicine)

  • Interventional studies

Exclusion criteria

  • Non-interventional studies (e.g. evaluations of screening or diagnostic tests)

  • Economic evaluations

  • Mixed primary outcomes (i.e. a primary outcome that comprised both a surrogate and final patient relevant outcome*)

  • Multi-arm trials

  • Secondary analyses

  • Early terminated studies

  • Equivalence or non-inferiority design

  • No analysable data

  • *Two examples of mixed primary outcomes seen in this study were a composite endpoint of death or vein graft restenosis28 and a composite of serum creatinine level, end stage renal disease, or death29

We classified studies into two groups according to whether the primary outcome was a surrogate one or a final patient relevant one. A final patient relevant outcome was defined as any outcome that captures “how a patient feels, functions or survives.”30 An outcome was consequently classified as a surrogate if it was a biomarker12 (for example, low density lipoprotein cholesterol level) or an intermediate outcome (for example, progression-free survival)13 31 judged to be a substitute for a final outcome. Where a trial did not state outcome primacy, we chose the one used for sample size calculation or the first outcome reported in the results section to be the primary outcome. As some subjectivity is involved in classification of outcomes as surrogate or final, two reviewers (OC, RST) resolved borderline cases by review of the full paper and discussion. To obtain comparable groups of trials using surrogate and final patient relevant outcomes, we used a hierarchical matching process to match each included surrogate outcome trial with a corresponding final patient relevant outcome trial, based on four criteria: the intervention clinical area, clinical population, journal, and publication year (see supplementary file for details of matched studies). After further detailed review of full papers, additional exclusions were necessary to omit trials with mixed primary outcome assessment and trials that were terminated early. As we sought to examine differences in treatment effects between trials using surrogate outcomes and those using final patient relevant outcomes, we excluded equivalence and non-inferiority designs.

Data extraction and risk of bias assessment

Using a predefined data extraction form we extracted data from the included trials on journal, sample size, patient population, type of intervention (drug, medical device, surgical procedure, health promotion activity, other therapeutic intervention32), duration of follow-up, centre status (single or multicentre), and sponsor (for profit, not for profit, or mixed33). For trials using surrogate outcomes we sought additional information on type of surrogate (imaging, histochemical/biochemical, instrumental, other), whether the authors explicitly reported that they had used a surrogate outcome (for example, the outcome was labelled as a “surrogate outcome,” “intermediate outcome”, or “non-clinical outcome”, or it was clearly understood in the context of the article that the outcome was a surrogate), and what authors reported in the publication on validity of the surrogate outcome.13 34 We assessed risk of bias in terms of the adequacy of random sequence generation and concealment, statement of double blind placebo controlled trial, and use of intention to treat analyses. One reviewer (OC) initially undertook data extraction and risk of bias assessment, and this was then checked by a second reviewer (TP or RST).

Data analyses

We compared the treatment effects between the two trial types using several analytical approaches. In accord with previous studies, for our primary analysis we sought binary outcomes in each trial recorded as the number of patients and events in each arm.3 4 5 6 7 35 Outcome events were recoded where necessary so that an odds ratio below 1.0 indicated beneficial effect of the intervention. Metaregression is a technique used to explore the relation between study characteristics (for example, sample size and journal of publication) and effect size.36 We used random effects logistic metaregression models35 to estimate ratios of odds ratios and 95% confidence intervals comparing treatment effects in trials using surrogate outcomes and final patient relevant outcomes. Ratio of odds ratios greater than 1.0 implied greater (more beneficial) treatment effects in the trials using surrogate outcomes than in the trials using final patient relevant outcomes. To take account of potential confounding, in our primary analysis we also included an adjusted analysis that incorporated predefined trial level covariates in the metaregression model—that is, clinical area of the treatment and patient population, intervention type, sponsor, journal, sample size, and mean follow-up time.

To assess the robustness of the primary analysis, we undertook several sensitivity analyses. Firstly, to maximise the number of studies in our analysis we first included trials that failed to report the number of patients and events in each arm but reported their results as a risk ratio (that is, relative risk, odds ratio, or hazard ratio). A pooled relative risk ratio comparing trials using surrogate outcomes with those using final patient relevant outcomes was estimated with inclusion of these trials. Secondly, for studies reporting continuous outcomes we first calculated a Cohen’s standardised mean difference and associated standard errors. We then transformed the standardised mean differences to log odds ratios and combined them with studies reporting binary outcome or risk ratios using random effects meta-analysis.36 37 Thirdly, we estimated log odds ratios and associated standard errors for each matched pair of trials, using the method of Bucher et al,38 and combined across trials using random effects meta-analysis. Sensitivity analyses are reported as unadjusted ratio of odds ratios (or relative risk ratios) and adjusted for trial covariates.

In a secondary analysis, we classified the trials using surrogate outcomes and final patient relevant outcomes according to whether the reported result for the primary outcome was positive (the treatment group was superior to control, P≤0.05), negative (the control group was superior to treatment, P≤0.05), or neutral (no significant difference between groups, P>0.05). We then compared the outcomes using an unadjusted logistic regression model and a model adjusted for study level covariates.

For both primary and secondary analyses, when not explicitly stated, we considered the latest available follow-up. Trials with multi-arms were included if it was possible and clinically meaningful to pool the number of events across arms towards a unique comparator—for example, different dosages of the same drug versus placebo arm. When treatment effect estimates and their variability were only shown graphically we used the open source software WinDig version 2.5 to extract numbers from the graphically presented information.

A regression test for funnel plot asymmetry was performed to assess small study effects and potential publication bias.39 All analyses were run in Stata SE version 12.

Results

A total of 639 titles and abstracts were identified, 511 of which were identified as eligible for the study. Of these, 27% (n=137) were judged to use a surrogate primary outcome. After matching and exclusions, 185 trials contributed to the quantitative analyses (fig 1). See the supplementary file for a list of included trials. The fidelity of matching of trials using surrogate outcomes and those using final patient relevant outcomes seemed to be retained in these 185 trials (table 1) and in the subgroup of trials reporting binary primary outcomes (data not shown). In both groups, drug interventions (surrogate: 58%; final: 61%, P=0.33) and not for profit sponsorship (surrogate: 58%; final: 56%, P=0.86) were most common. Trials using final patient relevant outcomes had a larger median sample size than trials using surrogate outcomes (P<0.001) and were also more likely to be multicentre (P=0.01). Follow-up between the two groups of trials did not differ significantly (P=0.73). Although the duration of the trials was similar overall, when chronic conditions, such as cardiovascular disease, cancer, and endocrine disorders, were considered, the trials using surrogate outcomes had a shorter median follow-up (255 v 730 days, P=0.03).

Figure1

Fig 1 Flow of studies through inclusion process

Table 1

 Characteristics of trials using surrogate primary outcomes and final patient relevant primary outcomes. Values are numbers (percentages) unless stated otherwise

View this table:

Trials with surrogate outcomes used laboratory biomarkers (52/137, 38%) such as prostate specific antigen; imaging (35/137, 25.5%) such as left ventricular ejection fraction; or instrumental endpoints (35/137, 25.5%) such as body weight. Fifteen trials were judged to use an intermediate outcome to substitute for a final outcome (15/137, 11%), such as disease-free survival or rate of ovulation. In fewer than half of the cases (36/84, 43%) authors explicitly stated they used a surrogate outcome and provided criteria or references for its validation in 28 out of 84 trials (35%).

Comparison of treatment effects

Primary analysis

Overall, 134 trials (51 surrogate outcome trials and 83 final outcome trials) reporting binary outcomes in the primary analysis were included. The pooled odds ratio for the primary outcome in the surrogate trials was 0.51 (95% confidence interval 0.42 to 0.60; I2=91.2%, P<0.001) compared with 0.76 (0.70 to 0.82; I2=89.8%, P<0.001) in trials using final patient relevant outcome. On average the treatment effect estimate was 47% higher in the trials using surrogate outcomes than in the trials using final patient relevant outcomes (ratio of odds ratios 1.47, 95% confidence interval 1.07 to 2.01, P=0.02). This difference remained after adjustment for characteristics of the trials (1.46, 1.05 to 2.04, P=0.03, table 2).

Table 2

 Comparison of treatment effects of trials using surrogate outcomes with trials using final patient relevant outcomes: primary and sensitivity analyses

View this table:

Sensitivity analyses

After incorporating trials with risk ratios as reported by the authors, 143 trials (57 surrogate outcome trials and 86 final outcome trials) were included. The treatment estimates in the trials using surrogate outcomes remained higher than in the ones using final patient relevant outcomes (unadjusted relative risk ratios 1.38, 95% confidence interval 1.12 to 1.71, P<0.01; adjusted relative risk ratios 1.36, 1.08 to 1.70, P<0.01). After combining continuous and binary outcomes in the overall sample, the estimated ratio of odds ratios showed a similar direction of effect (unadjusted 1.44, 95% confidence interval 0.83 to 2.49, P=0.20; adjusted 1.48, 0.83 to 2.62, P=0.18). A total of 43 pairs of matched trials were available for a paired analysis, with a pooled ratio of odds ratios of 1.38 (1.01 to 1.88, P=0.04).

Secondary analysis

Trials using surrogate outcomes were more likely to obtain a positive result (as stated by the authors) in favour of the treatment (52/84, 62%) than trials using final patient relevant outcomes (37/101, 37%). The odds ratio of reporting a positive result in the trials using surrogate outcomes compared with trials using final patient relevant outcomes was 2.17 (95% confidence interval 1.20 to 3.92, P=0.01) and 2.43 (1.29 to 4.57, P<0.01) after adjustment for the characteristics of the trials.

Influence of trial characteristics

No trial characteristics showed a statistically significant association with treatment effect estimates in bivariate metaregression models, in addition to type of primary endpoint, except for sample size and single centre status. Harbord’s modified test showed similar levels of small study bias in both the trials using surrogate outcomes (t=3.63; P=0.001) and the trials using final patient relevant outcomes (t=2.79; P=0.007; see the funnel plot in the supplementary file). We undertook a retrospective analysis including centre status as trial level covariate in the primary analysis adjusted model, resulting in a ratio of odds ratios of 1.28 (95% confidence interval 0.96 to 1.72, P=0.09). There was no evidence of an interaction between the primary analysis ratio of odds ratios of trials using surrogate outcomes compared with trials using final patient relevant outcomes and the characteristics of the trials (fig 2).

Figure2

Fig 2 Ratio of odds ratios comparing treatment effect estimates in trials using surrogate outcomes versus trials using final primary end points stratified by key trial characteristics. *P values from tests of interaction between type of primary outcome and trial characteristics

Risk of bias

The risk of bias between trials using surrogate outcomes and those using final patient relevant outcomes did not differ significantly (table 3). Further adjustment of the metaregression estimates for these four factors did not change the inference of the primary analysis comparing trials using surrogate and final patient relevant outcomes—that is, risk of bias adjusted ratio of odds ratios 1.45 (95% confidence interval 1.06 to 1.99, P=0.02). No risk of bias characteristic was found to be significantly associated with treatment effect and there was no interaction with the primary analysis ratio of odds ratios of trials using surrogate outcomes compared with those using final patient relevant outcomes (fig 2).

Table 3

 Summary of risk of bias assessment for trials reporting biomarkers or intermediate outcomes (surrogate outcomes) versus final patient relevant primary outcomes

View this table:

Discussion

We provide empirical evidence that trials using surrogate primary outcomes report larger treatment effects than a matched sample of trials using final patient relevant primary outcomes. We analysed a cohort of randomised clinical trials categorised according to whether their primary outcome was surrogate or a final patient relevant endpoint and matched on the basis of key characteristics of the trials. On average, trials using surrogate outcomes reported treatment effects that were 28% to 48% higher than those of trials using final patient relevant outcomes. Furthermore, we found that surrogate trials were twice as likely to report positive treatment effects as the final outcome trials. These findings were not explained by differences in risk of bias or other trial characteristics and are comparable with the level of exaggeration of treatment effect attributed to inadequate allocation concealment.5 Although, as anticipated, we found that trials with a patient relevant primary outcome were more likely to have larger sample sizes, two groups of trials had similar average follow-up times. However, when limited to trials of chronic conditions, follow-up was longer for trials using final patient relevant outcomes. Given the range of interventions and outcomes included, substantial statistical heterogeneity was evident in treatment effects in both types of trials.

Comparison of our findings with previous studies

Few studies have empirically compared trials reporting surrogate and final primary outcomes. One study33 reviewed 324 consecutive cardiovascular trials published in major general medical journals between 2000 and 2005. In accord with the findings of the present study, trials reporting surrogate primary outcomes were more likely to report a positive treatment effect (77 out of 115 trials, 67%) than trials reporting final patient related primary outcomes (113 out of 209 trials, 54%, P=0.02). A recent systematic review of anti-tumour necrosis factor agents for rheumatoid arthritis40 compared the methodological quality of trials reporting surrogate primary outcomes with those with final primary outcomes. In contrast with the present study, a difference in study quality between the two groups of trials was seen; the mean percentage of items met in the consolidated standards of reporting trials (CONSORT) statement was lower for studies with surrogate outcomes than with final patient relevant outcomes (62.5 v 70.7, P=0.03). However, as this systematic review included both randomised and non-randomised trials, and fewer studies with surrogate outcomes were randomised (63% v 74%), this finding is likely to be confounded.

Several reasons may explain why trials assessing surrogate endpoints showed larger treatment effects than trials assessing final endpoints. The first relate to small study effects—the tendency for smaller studies to show a large treatment effect.41 As expected, we found a smaller sample size for trials using surrogate outcomes than for trials using final patient relevant outcomes. However, our results remained consistent after adjustment for the sample size of the trials. A second reason may relate to publication bias. Although we observed substantive small study bias and therefore potential publication bias, the extent of this bias seemed similar across the two sets of trials. Furthermore, it may be argued that published results based on surrogate outcomes are more likely to be positive as the requirements for publication of such results are generally more stringent, whereas results based on (definitive) final endpoints are more likely to be published regardless of the trial’s findings. Thirdly, trials using surrogate outcomes may be of lower methodological quality than trials using final patient relevant outcomes and therefore more prone to exaggeration of effect size.40 However, a comparison of risk of bias between the two groups of randomised clinical trials showed no differences in random sequence generation, allocation concealment, blinding, and intention to treat analysis. Furthermore, our results were consistent after adjustment for these risks of bias dimensions. Fourthly, two recent meta-epidemiological studies have shown that single centre trials are more likely than multicentre trials to lead to larger intervention effects.7 8 In our sample a higher proportion of the trials using surrogate outcomes were single centre trials compared with the trials using final patient relevant outcomes. However, an additional retrospective sensitivity analysis including adjustment for centre status showed consistent results, with use of surrogate outcomes still associated with larger treatment effects.

Finally, the treatment effect may be truly larger in trials with surrogate outcomes than with final patient relevant outcomes. In the continuum of health outcomes measures, biomarkers and intermediate outcomes can be identified as disease centered measures, reflecting the biology of the disease process and the underlying mechanism of disease.42 Assuming the surrogate outcomes lie in the causal pathway between the onset of the disease and the final patient relevant outcome, they are generally more proximal (closer) to the disease and therefore more sensitive to the effect of interventions with therapeutic purposes.

Limitations of the study

As our sample of randomised clinical trials was drawn from six high impact general medical journals over two specific consecutive calendar years, the findings may lack generalisability. We purposively chose general medical journals so as to compare surrogate and final patient related outcomes across a range of medical conditions. Although the choice of publication year would not be expected to influence treatment effects, trials published in high impact journals, although contributing a relatively small proportion of all published trials,43 are more likely to report newsworthy results.44 However, it is unclear how this would have influenced the generalisability of our findings. High impact journals might be expected to publish trials of lower risk of bias. We observed higher methodological quality of trials in our sample compared with a representative sample of trials indexed in PubMed,44 therefore it could be argued our findings are less likely to be susceptible to confounding by other aspects of trial methodological quality.

In addition, we compared the treatment effects of a matched sample of randomised clinical trials reporting surrogate primary outcomes and final primary outcomes. Alternatively, we could have compared the treatment effects between surrogate and final outcomes within the same trials or meta-analyses of homogeneous trials (a meta-epidemiological analysis).45 Although such a “within trial” comparison would minimise confounding by study population, intervention type, and risk of bias, this approach has problems. Firstly, trials are generally powered to detect statistically significant differences in their primary endpoint. Trials with surrogate primary outcomes may be underpowered for final patient relevant outcomes and thus lead to imprecision in the estimation of the comparative treatment effect of surrogate and final outcomes. Secondly, where within trial meta-analysis comparisons of surrogate and final outcomes have been performed, they have been limited to a single treatment (or treatment class) in one specific disease area.46 47 48 49 In this study we sought to address a different question—that is, in the absence of final patient relevant outcomes, what is the potential effect of relying on the treatment effects based on surrogate outcomes across a range of medical conditions and interventions and surrogate outcome types? To maximise their comparability we matched the cohorts of surrogate and final trials on the basis of key study characteristics, such as disease and intervention area.

Finally, the classification of primary outcomes as surrogate or final patient relevant involves an element of subjective judgment. For example, change in body mass index50 and carbon monoxide confirmed smoking abstinence rate51 were both classified as surrogate outcomes, having assumed the patient relevant outcomes in these cases to be long term decline in lung function and lung cancer, respectively. To minimise assessment bias, two reviewers independently applied a standard outcome definition of surrogate and final patient relevant outcomes across all trials, with discussion and consensus on any initial disagreements. To our knowledge this is the first empirical study designed to deal with this subject and therefore our results should be verified in another sample.

Implications of the study

The potential for surrogate outcomes to impact on healthcare policy making and the consequent diffusion of treatments into practice is shown by the fact that 27% of the randomised clinical trials identified during the two year study period reported surrogate primary outcomes. In health technology assessment reports, both in the United Kingdom13 and internationally,52 some 1 in 20 base their clinical and economic conclusions on evidence from surrogate outcomes alone. That trial based surrogate outcomes can lead to substantive overestimation of treatment effects that would have been seen if evidence on patient relevant outcomes was available is a salutary message for policy makers when weighing up the evidence from use of surrogate outcomes in their licensing and coverage decisions. Several recent drug appraisals by the National Institute for Health and Clinical Excellence have relied on evidence of clinical effect derived solely from surrogate outcomes.53 54 55 Our findings reinforce the importance of formally evaluating the acceptability of biomarkers and intermediate outcomes as valid surrogate outcomes and quantifying the association of treatment effect between the surrogate and patient relevant final outcomes and its uncertainty. The statistical validation of surrogates and the quantification of the relation between surrogate and final patient relevant outcomes are key problems tackled in NICE’s update of its methodological guidance for technology assessment.56 The updated version of methods guidance makes several requirements for health technology assessment producers when faced with clinical trials with evidence based on surrogate outcomes (that is, a systematic review of the evidence to support the validity of the surrogate outcome, clear statement of how the relation between surrogate and final outcome is modelled in determining cost effectiveness) and exploration of the additional uncertainty associated with this prediction on cost effectiveness estimates.26

Clinical trialists and systematic reviewers need to be clearer in their reporting as to whether outcomes are surrogate or final patient relevant, and appropriately frame any conclusions of superiority of interventions when based on surrogate outcomes alone. Others have recently suggested that guidance on surrogates should be incorporated into the CONSORT statement.27 Novel and adaptive approaches to trial design are needed that allow surrogate endpoints to continue to be used as primary outcomes, while also providing evidence on their validation against patient relevant outcomes.57

Conclusions

In the absence of data on final outcome, policy makers need to interpret intervention effects based on surrogate outcomes with caution. Although our results have highlighted the risks, they support the application of methods for the validation and quantification of the relations between surrogate and final patient relevant outcomes in licensing and reimbursement decisions on new and existing treatments.

What is already known on this topic

  • Surrogate outcomes are used to substitute and predict for a final patient relevant outcome in clinical trials

  • Failures of specific surrogate outcomes have been reported in the literature

  • Licensing and coverage decisions of health technologies often rely on evidence based on surrogate outcomes

What this study adds

  • Trials reporting surrogate primary outcomes are more likely to report larger treatment effects than trials reporting final patient relevant primary outcomes

  • In the absence of patient relevant outcomes, policy makers should rely on validated surrogate outcomes and take into account the potential uncertainty in their prediction of treatment benefit and harm

Notes

Cite this as: BMJ 2013;346:f457

Footnotes

  • Contributors: OC and RST conceived and designed the study. OC screened the titles and abstracts, data extracted papers, ran the analyses, and drafted the manuscript. RST screened the titles and abstracts and checked data extraction and analyses. TP checked data extraction. MB and JAS advised on methods of data analysis. All authors commented on drafts of the manuscript. RST is guarantor.

  • Competing interests: All authors have completed the ICMJE uniform disclosure form at www.icmje.org/coi_disclosure.pdf (available on request from the corresponding author) and declare no financial relationships with any organisations that might have an interest in the submitted work in the previous three years; no other relationships or activities that could appear to have influenced the submitted work. OC is currently receiving a Peninsula College of Medicine and Dentistry Doctoral studentship.

  • Ethical approval: Not required.

  • Data sharing: The dataset is available from the corresponding author at oriana.ciani{at}pcmd.ac.uk.

This is an open-access article distributed under the terms of the Creative Commons Attribution Non-commercial License, which permits use, distribution, and reproduction in any medium, provided the original work is properly cited, the use is non commercial and is otherwise in compliance with the license. See: http://creativecommons.org/licenses/by-nc/2.0/ and http://creativecommons.org/licenses/by-nc/2.0/legalcode.

References

View Abstract