Original Article
AHRQ Series Paper 5: Grading the strength of a body of evidence when comparing medical interventions—Agency for Healthcare Research and Quality and the Effective Health-Care Program

https://doi.org/10.1016/j.jclinepi.2009.03.009Get rights and content

Abstract

Objective

To establish guidance on grading strength of evidence for the Evidence-based Practice Center (EPC) program of the U.S. Agency for Healthcare Research and Quality.

Study Design and Setting

Authors reviewed authoritative systems for grading strength of evidence, identified domains and methods that should be considered when grading bodies of evidence in systematic reviews, considered public comments on an earlier draft, and discussed the approach with representatives of the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) working group.

Results

The EPC approach is conceptually similar to the GRADE system of evidence rating; it requires assessment of four domains: risk of bias, consistency, directness, and precision. Additional domains to be used when appropriate include dose–response association, presence of confounders that would diminish an observed effect, strength of association, and publication bias. Strength of evidence receives a single grade: high, moderate, low, or insufficient. We give definitions, examples, mechanisms for scoring domains, and an approach for assigning strength of evidence.

Conclusion

EPCs should grade strength of evidence separately for each major outcome and, for comparative effectiveness reviews, all major comparisons. We will collaborate with the GRADE group to address ongoing challenges in assessing the strength of evidence.

Introduction

Comparative effectiveness reviews (CERs), like systematic reviews in general, are essential tools for summarizing information to help make well-informed decisions about health care options [1]. CERs explicitly compare two or more screening or diagnostic strategies or therapeutic interventions. The Evidence-based Practice Center (EPC) program, supported by the U.S. Agency for Healthcare Research and Quality (AHRQ), produces substantial numbers of evidence reports and CERs. These reports are designed to accurately and transparently summarize a body of literature with the goal of helping clinicians, policymakers, and patients make well-informed decisions about health care. Reviews should provide clear judgments about the strength of the evidence that underlies conclusions to enable decision makers to use them effectively [2].

In 2007, AHRQ supported a cross-EPC set of work groups to develop guidance on major elements of designing, conducting, and reporting CERs [3]. This paper reports the outcomes of the EPC work group on grading strength of evidence. We briefly explore the rationale for grading strength of evidence, define the domains of concern for evidence strength, and describe our recommended grading system for such reviews. Our main objective was to give guidance to EPCs for grading strength of evidence in CERs, but this guidance may also apply to other systematic reviews.

The EPCs prepare reports that are used by a variety of decision makers, but the EPCs do not themselves develop recommendations. Therefore, the goal of our evidence rating system was to facilitate use of the reports by decision makers who may have differing perspectives. This separation of the raters of the strength of evidence from the decision makers led to some differences in the system we propose relative to other rating systems that are designed to be used directly by decision makers.

The EPC approach is based in large measure on the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) working group approach [4], [5], [6]. We briefly discuss the differences in emphasis between the two systems, and note that EPC and GRADE experts will explore ways to harmonize the two methods and to offer reviewers and decision makers a coordinated model for grading strength of evidence. This paper presents the approach that EPCs are expected to implement for CERs in the meantime.

Section snippets

Strength of evidence: rationale

Among organizations that make practice guidelines or coverage decisions and among experts who develop systematic reviews, assessment of the strength of a body of evidence is widely accepted. In drawing conclusions about strength of evidence, a growing number of organizations adopt systematic approaches to making judgments about the strength of evidence. A wide variety of grading systems is available for this purpose [7], and different organizations may weigh features, or domains, of a body of

Strength of evidence: domains

The EPC approach to grading evidence begins with assessments of a set of agreed-upon domains pertaining to entire bodies of evidence about major outcomes (benefits and harms) and comparisons—i.e., outcomes and comparisons that are most important to decision makers in clinical practice and health policy. A determination of which outcomes and comparisons the EPCs consider important enough to warrant formal grading of the strength of the evidence will depend on the key questions, the clinical or

Four strengths of evidence levels

The overall grade for strength of evidence reflects a global assessment that takes the required domains directly into account and, as needed, incorporates judgments about the additional domains as well. For each comparison of interest, EPCs should rate strength of evidence for each major benefit (e.g., positive impact on health outcomes such as physical function or quality of life, or effects on laboratory measures or other surrogate variables) and for each major harm (ranging from rare,

Reporting strength of evidence

As noted above, CERs should present information about all comparisons of interest for the outcomes that are most important to patients and other decision makers. Thus, strength of evidence should relate to those important outcomes. Complete and perfect information is rarely available. For some treatments, data may be lacking about one or more of the outcomes. In other cases, the available evidence comes from studies that have important flaws, is imprecise, or is not applicable to some

Discussion

The EPC approach to rating the strength of evidence draws heavily on the international GRADE system; both conceptually and substantively, it is similar to GRADE. Our recommendations address specific circumstances of the EPC program, which differ from those of some groups that use GRADE. The EPC program produces systematic reviews, but it is not involved directly in development of recommendations or guidelines. Rather, EPC reports are used by a spectrum of government agencies, professional

Acknowledgments

This research was funded through contracts from the AHRQ to the following EPCs: ECRI Institute (290-02-0019); Johns Hopkins University (290-02-0018), Oregon Health & Science University (290-02-0009); RTI International (290-02-0016); and Stanford University (290-02-0017). The opinions expressed here are those of the authors and do not necessarily represent the views of the AHRQ, the Department of Health and Human Services, or the Department of Veterans Affairs. The authors thank Valerie King,

References (15)

There are more references available in the full text version of this article.

Cited by (364)

View all citing articles on Scopus
View full text