Review Article
Development and use of reporting guidelines for assessing the quality of validation studies of health administrative data

https://doi.org/10.1016/j.jclinepi.2010.10.006Get rights and content

Abstract

Background and Objectives

Validation of health administrative data for identifying patients with different health states (diseases and conditions) is a research priority, but no guidelines exist for ensuring quality. We created reporting guidelines for studies validating administrative data identification algorithms and used them to assess the quality of reporting of validation studies in the literature.

Methods

Using Standards for Reporting of Diagnostic accuracy (STARD) criteria as a guide, we created a 40-item checklist of items with which identification accuracy studies should be reported. A systematic review identified studies that validated identification algorithms using administrative data. We used the checklist to assess the quality of reporting.

Results

In 271 included articles, goals and data sources were well reported but few reported four or more statistical estimates of accuracy (36.9%). In 65.9% of studies reporting positive predictive value (PPV)/negative predictive value (NPV), the prevalence of disease in the validation cohort was higher than in the administrative data, potentially falsely elevating predictive values. Subgroup accuracy (53.1%) and 95% confidence intervals for accuracy measures (35.8%) were also underreported.

Conclusions

The quality of studies validating health states in the administrative data varies, with significant deficits in reporting of markers of diagnostic accuracy, including the appropriate estimation of PPV and NPV. These omissions could lead to misclassification bias and incorrect estimation of incidence and health services utilization rates. Use of a reporting checklist, such as the one created for this study by modifying the STARD criteria, could improve the quality of reporting of validation studies, allowing for accurate application of algorithms, and interpretation of research using health administrative data.

Introduction

What is new?

  • Significant deficits exist in the validation and reporting of algorithms used to identify patients within health administrative data.

  • Misclassification error represents an important form of bias in research using health administrative databases.

  • The modified Standards for Reporting of Diagnostic accuracy criteria reported here can be used to improve the quality of reporting in studies of health state (disease or conditions) identification validation.

  • Future efforts should address criteria for conduct and reporting of research using health administrative data.

Health services and epidemiologic research are best conducted with population-level data. This helps ensure the appropriate estimation of incidence and prevalence rates, the minimization of referral bias, and the overall generalizability of the study conclusions to the population of interest. Because prospective clinical registries and retrospective chart review comprising a representative sample or all residents of a jurisdiction are impractical, health administrative data are an alternative for population-based chronic disease surveillance, outcomes research, and health services research. Health administrative data are defined as information passively collected, often by government and health care providers, for the purpose of managing the health care of patients [1], and are a subtype of automated health care data [2]. Examples include physician billing databases (such as those managed by government in single-payer health systems or by health maintenance organizations [HMOs]), and hospital discharge record databases. Accuracy of the diagnostic codes used to identify patients within these data depends on multiple factors including database quality, the specific condition being identified, and the validity of the codes in the patient group. A large gradient in data quality exists, with some databases being of higher quality than others [3]. Isolated diagnostic codes associated with physician billing records have been shown to be accurate to identify patients with some chronic diseases [4], [5] but not others [6], [7], [8], [9]. Since chronic diseases usually require multiple contacts with the health system to diagnose, a single-visit diagnostic code is often insufficient to accurately identify patients with the disease. The validity of codes is also dependent on the patient group being studied. For instance, the accuracy of diagnostic codes or combinations of codes (algorithms) varies across age groups because of variable use of the health system [6], [7], [10]. As such, validation of algorithms used to identify patients with different health states (including acute conditions, chronic diseases, and other health outcomes) is essential to avoid misclassification bias [11], which may threaten the internal validity and interpretation of study conclusions. For example, assessment of health services utilization in a cohort of patients with a chronic disease contaminated by large number of healthy residents falsely labeled as having a chronic disease would underestimate the burden of the disease on the health system or the quality and performance of the health system. Similarly, assessment of incidence of the disease in the cohort would overestimate risk to the population. Although the validation of administrative data coding has been identified as a priority in the health services research by an international consortium [3], the complete and accurate reporting of algorithm validation research is equally important to appropriate application. The growing availability of administrative data for research coupled with the expense, privacy concerns, and complex methodologies required to validate identification algorithms have resulted in algorithms being applied to these databases by researchers not involved in their initial validation. As such, minimum quality criteria for the conduct and reporting of algorithm validation studies would benefit scientists using these algorithms and consumers of the research on which these algorithms rely.

The purpose of this study was to appraise all studies that validated algorithms to identify patients with different health states within the administrative data with newly developed consensus criteria for the reporting of studies that validate health administrative data algorithms, based on the Standards for Reporting of Diagnostic accuracy (STARD) initiative [12]. In so doing, we aimed to identify strengths and weaknesses in the methods of such validation studies to improve the future reporting of research using health administrative data.

Section snippets

Development of validation study quality checklist

Algorithms to identify patients with different health states may be considered a type of diagnostic test applied to health administrative data, and markers of diagnostic accuracy are often reported in studies validating algorithms against reference standards. As such, we modified the criteria published by the STARD initiative for the accurate reporting of studies using diagnostic tests [12] to evaluate included studies. Four experts (E.I.B., D.M., T.T., and A.G.) in research using these data

Results

A total of 6,423 references were reviewed, resulting in 271 included studies from 16 countries (Fig. 1). A list of included studies is provided in Supplemental Data 3. Of the 271 included studies, 160 were from the United States; 50 from Canada; 12 from Australia; 7 each from Denmark, Italy, and the United Kingdom; 6 from France; 5 each from Brazil, the Netherlands, and Sweden; 2 from Israel; and one each from Finland, Germany, Norway, Spain, and Switzerland. Approximately half of the studies

Discussion

The translation of research from the literature to medical practice or health policy requires the research to be appropriately designed, reported, and interpreted. As such, consortia have created criteria for the reporting of clinical trials [18], observational studies [19], and studies of diagnostic accuracy [12]. These criteria are guidelines for researchers involved in study design and for consumers of the literature to assess the quality of the research. Unfortunately, no such criteria

Acknowledgments

The authors wish to thank Ms Elizabeth Uleryk (Director, Hospital Library, The Hospital for Sick Children) for aiding in the search strategy used in this review, Ms Danielle Benchimol for data entry, and Drs Yaron Avitzur and Tanja Gonska for provision of translation services. Eric Benchimol is a Canadian Institutes of Health Research (CIHR) training fellow in the Canadian Child Health Clinician Scientist Program, in partnership with SickKids Foundation and the Child and Family Research

References (33)

  • F. Ahmed et al.

    Preferred provider organization claims showed high predictive value but missed substantial proportion of adults with high-risk conditions

    J Clin Epidemiol

    (2005)
  • K. Armstrong et al.

    Measuring adherence to mammography screening recommendations among low-income women

    Prev Med

    (2004)
  • R.A. Spasoff

    Epidemiologic methods for health policy

    (1999)
  • FDA's Sentinel Initiative

    (2010)
  • C. De Coster et al.

    Identifying priorities in methodological research using ICD-9-CM and ICD-10 administrative data: report from an international consortium

    BMC Health Serv Res

    (2006)
  • L. Lix et al.

    Defining and validating chronic diseases: an administrative data approach

    (2006)
  • G. Chen et al.

    Measuring agreement of administrative data with chart data using prevalence unadjusted and adjusted kappa

    BMC Med Res Methodol

    (2009)
  • E.I. Benchimol et al.

    Increasing incidence of paediatric inflammatory bowel disease in Ontario, Canada: evidence from health administrative data

    Gut

    (2009)
  • A. Guttmann et al.

    Validation of a health administrative data algorithm for assessing the epidemiology of diabetes in Canadian children

    Pediatr Diabetes

    (2010)
  • J.E. Hux et al.

    Diabetes in Ontario: determination of prevalence and incidence using a validated administrative data algorithm

    Diabetes Care

    (2002)
  • T. To et al.

    Case verification of children with asthma in Ontario

    Pediatr Allergy Immunol

    (2006)
  • D.G. Manuel et al.

    How many people have had a myocardial infarction? Prevalence estimated using historical hospital data

    BMC Public Health

    (2007)
  • P.M. Bossuyt et al.

    Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative

    BMJ

    (2003)
  • A.B. Nattinger et al.

    An algorithm for the use of Medicare claims data to identify women with incident breast cancer

    Health Serv Res

    (2004)
  • D.C. Payne et al.

    Assessment of anthrax vaccination data in the Defense Medical Surveillance System, 1998-2004

    Pharmacoepidemiol Drug Saf

    (2007)
  • C.L. Roberts et al.

    The accuracy of reporting of the hypertensive disorders of pregnancy in population health data

    Hypertens Pregnancy

    (2008)
  • Cited by (0)

    This research was conducted with the support of a Clinical Research Award from the American College of Gastroenterology.

    View full text