Skip to main content

Statistical Methods in Cancer Epidemiological Studies

  • Protocol

Part of the book series: Methods in Molecular Biology ((MIMB,volume 471))

Summary

In this chapter, we discuss statistical methods for various study designs that are commonly used in epidemiological research and particularly in cancer epidemiological research. After a brief review of basic concepts in epidemiological studies, statistical methods for case-control studies and cohort studies are discussed. Statistical methods for nested case-control and case-cohort studies, which have been increasingly used in cancer epidemiology, also are discussed. This chapter is designed for cancer epidemiologists who understand basic statistical methods for commonly used epidemiological study designs and are able to initiate power and sample size calculations. Therefore, this chapter emphasizes newly developed statistical methods for epidemiological studies as well as study planning.

This is a preview of subscription content, log in via an institution.

Buying options

Protocol
USD   49.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Springer Nature is developing a new tool to find and evaluate Protocols. Learn more

Reference

  1. Benjamin, Y., and Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57, 289–300.

    Google Scholar 

  2. Westfall, P.H., and Young, S.S. (1993) Resampling-based Multiple Testing, New York : John Wiley & Sons, Inc.

    Google Scholar 

  3. Hoh, J., Wille, A., Zee, R., et al. (2000) Selecting SNPs in two-stage analysis of disease association data: a model-free approach. Am. J. Hum. Genet. 64, l413–7. 4. Hintze, J.L. (2001) PASS: Power and Sample Size Software. East Kaysville UT.

    Google Scholar 

  4. Hoh, J., Wille, A., Zee, R., et al. (2000) Selecting SNPs in two-stage analysis of disease association data: a model-free approach. Am. J. Hum. Genet. 64, l413–7.

    Google Scholar 

  5. Hintze, J.L. (2001) PASS: Power and Sample Size Software. East Kaysville UT.

    Google Scholar 

  6. Rothman, K.J. (1986) Modern Epidemiology, Boston/Toronto : Little, Brown and Company.

    Google Scholar 

  7. Armitage, P., and Berr y, G. (1990) Statistical Methods in Medical Research, London : Cambridge University Press,.

    Google Scholar 

  8. Breslow, N.E., and Day, N.E. (1980) Statistical Methods in Cancer Research, Volume I: The Analysis of Case-Control Studies, IARC Scientific Publications, No. 32, Lyon, France : International Agency for Research on Cancer.

    Google Scholar 

  9. Armitage, P. (1955) Test for linear trends in proportions and frequencies. Biometrics 11, 375–386.

    Article  Google Scholar 

  10. Mantel, N. (1963). Chi-square tests with one degree of freedom: extensions of the Mantel-Haenszel procedure. J. Am. Stat. Assoc. 58, 690–700.

    Article  Google Scholar 

  11. Mantel, N., and Haenszel, W. (1959). Statistical aspects of the analysis of data from retrospective studies of disease. J. Natl. Cancer Inst. 22, 719–748.

    CAS  PubMed  Google Scholar 

  12. Robins, J.M., Breslow, N.E., and Greenland, S. (1986) Estimators of the Mantel-Haen-szel variance consistent in both sparse data and large-strata limiting models. Biometrics 42, 311–323.

    Article  CAS  PubMed  Google Scholar 

  13. Agresti, A. (2002) Categorical Data Analysis, 2nd edition, New York : John Wiley & Sons, Inc.

    Book  Google Scholar 

  14. Hosmer, D.W, Jr., and Lemeshow, S. (2000) Applied Logistic Regression, 2nd edition, New York : John Wiley & Sons, Inc.

    Book  Google Scholar 

  15. Allison, P.D. (1999) Logistic Regression Using the SAS System: Theory and Application, Cary, NC : SAS Institute.

    Google Scholar 

  16. SAS Institute. (1995) Logistic Regression Examples Using the SAS System, Cary, NC: SAS Institute Inc.

    Google Scholar 

  17. Hsieh, F.Y., Block, D.A., and Larsen, M.D. (1998) A Simple Method of Sample Size Calculation for Linear and Logistic Regression. Stat. Med. 17, 1623–1634.

    Article  CAS  PubMed  Google Scholar 

  18. McNemar, Q. (1947) Note on the sampling error of the difference between correlated proportions or percentages. Psychometrika 12, 153–157.

    Article  CAS  PubMed  Google Scholar 

  19. Liddell, F.D.K. (1983) Simplified exact analysis of case-referent studies: matched pairs; dichotomous exposure. J. Epidemiol. Community Health 37, 82–84.

    Article  CAS  PubMed  Google Scholar 

  20. Ury, H.K. (1975) Efficiency of casecontrol studies with multiple controls per case: continuous or dichotomous data. Biometrics 31, 643–649.

    Article  CAS  PubMed  Google Scholar 

  21. Cox, D.R., and Hinkley, D.V. (1974) Theoretical Statistics, London, UK : Chapman & Hall.

    Google Scholar 

  22. Stokes, M.E., Davis, C.S., and Koch, G.G. (2000) Categorical Data Analysis Using the SAS System, 2nd edition. Cary, NC : SAS Institute.

    Google Scholar 

  23. Allison, P.D. (1995) Survival Analysis Using the SAS System: A Practical Guide, Cary, NC : SAS Institute.

    Google Scholar 

  24. Dupont, W. (1988) Power calculations for matched case-control studies. Biometrics 44, 1157–1168.

    Article  CAS  PubMed  Google Scholar 

  25. Walker, A.M. (1982) Anamorphic analysis: sampling and estimation for covariate effects when both exposure and disease are known. Biometrics 38, 1025–32.

    Article  CAS  PubMed  Google Scholar 

  26. White, J.E. (1982) A two-stage design for the study of the relationship between a rare exposure and a rare disease. Am. J. Epide miol. 115, 119–28.

    CAS  Google Scholar 

  27. Cain, K.C., and Breslow, N.E. (1988) Logistic regression analysis and efficient design for two-stage studies. Am. J. Epide miol. 128, 1198–206.

    CAS  Google Scholar 

  28. Scott, A.H., and Wild, C.J. (1997) Fitting regression models to case-control data by maximum likelihood. Biometrika 84, 57–71.

    Article  Google Scholar 

  29. Chatterjee, N., Chen, Y.H., and Breslow, N.E. (2003) A pseudoscore estimator for regression problems with two-stage sam pling. J. Am. Stat. Assoc. 98, 158–68.

    Article  Google Scholar 

  30. Reilly, M. (1996) Optimal sampling strate gies for two-stage studies. Am. J. Epidemiol. 143, 92–100.

    CAS  PubMed  Google Scholar 

  31. Hanley, J.A., Csizmadi, I., and Collet, J.-P. (2005) Two-stage case-control studies: precision of parameter estimates and con siderations in selecting sample size. Am. J. Epidemiol. 162, 1225–1234.

    Article  PubMed  Google Scholar 

  32. Thomas, D., Xie, R., and Mulugeta G. (2004) Two-stage sampling designs for gene association studies. Genet. Epidemiol. 27, 401–414.

    Article  PubMed  Google Scholar 

  33. Maddala, G.S. (1983) Limited-Dependent and Qualitative Variables in Econometrics, New York: Cambridge University Press.

    Google Scholar 

  34. Kalbfleisch, J.D., and Prentice, R.L. (1980) The Statistical Analysis of Failure Time Data, New York : John Wiley & Sons, Inc.

    Google Scholar 

  35. Kaplan, E.L., and Meier, P. (1958) Nonpar-ametric estimation form incomplete obser vations. J. Am. Stat. Assoc. 53, 457–481.

    Article  Google Scholar 

  36. Greenwood, M. (1926) The errors of sam pling of the survivorship tables, in Reports on Public Health and Statistical Subjects, no. 33. London: HMSO, Appendix I.

    Google Scholar 

  37. Miller, R.G., Jr. (1983) What Price Kaplan-Meier? Biometrics 39, 1077–1081.

    Article  PubMed  Google Scholar 

  38. Meier, P., Karrison, T., Chappell, R., and Xie, H. (2004) The Price of Kaplan-Meier. J. Am. Stat. Assoc. 99, 890–896.

    Article  Google Scholar 

  39. Lawless, J.F. (1982) Statistical Methods and ethods for Lifetime Data, New York: John Wiley & Sons, Inc.

    Google Scholar 

  40. Collett, D. (1994) Modeling Survival Data in Medical Research, p. 23, London, UK: Chapman & Hall.

    Google Scholar 

  41. Tsiatis, A.A. (1975) Nonidentifiability aspect of the problem of competing risks. Proc. N Υ Acad. Sci. 72 (1), 20–22.

    Article  CAS  Google Scholar 

  42. Breslow, N.E., and Day, N.E. (1987) Sta tistical Methods in Cancer Research, Vol ume II: The Design and Analysis of Cohort Studies, IARC Scientific Publications, No. 82, Lyon, France: International Agency for Research on Cancer.

    Google Scholar 

  43. Cox, D.R. (1972) Regression models and life tables. J. R. Stat. Soc. Ser. B 20, 187–220.

    Google Scholar 

  44. Cox, D.R., and Oakes, D. (1984) Analysis of Survival Data, London, UK: Chapman & Hall.

    Google Scholar 

  45. Andersen, P.K., Borgan, Ø., Gill, R.D., and Keiding, N. (1992) Statistical Models Based on Counting Processes, New York: Springer-Verlag.

    Google Scholar 

  46. Schoenfeld, D. (1982) Partial residuals for the proportional hazards regression model. Biometrika 69, 239–241.

    Article  Google Scholar 

  47. Thiebaut, A.C.M., and Benichou, J. (2004) Choice of time-scale in Cox's model analysis of epidemiologic cohort data: a simulation study. Stat. Med. 23, 3803–3820.

    Article  PubMed  Google Scholar 

  48. Prentice, P.L., and Gloeckler, L.A. (1978) Regression analysis of grouped survival data with applications to breast cancer data. Bio metrics 34, 57–67.

    CAS  Google Scholar 

  49. Allison, P.D. (1982) Discrete-time methods for the analysis of event histories. In: Socio logical Methods and Research, 15 ed. S. Leinhardt, San Francisco, CA: Jossey-Bass, 61–98.

    Google Scholar 

  50. D'Agostino, R.B., Lee, M.-L., Belanger, A.J., Cupples, L., Anderson, K., and Kan-nel, W.B. (1990) Relation of pooled logistic regression to time dependent Cox regres sion analysis: the Framingham Heart Study. Stat. Med. 9, 1501–1515.

    Article  PubMed  Google Scholar 

  51. Sun, J. (2006) The Statistical Analysis of Interval-censored Failure Time Data, NY: Springer.

    Google Scholar 

  52. Newman, A.B., Arnold, A.M., Naydeck, B.L., et al. (2003) Successful aging: effect of subclinical cardiovascular disease. Arch. Intern. Med. 163, 2315–2322.

    Article  PubMed  Google Scholar 

  53. Wei, L.J., Lin, D.Y., and Weissfeld, L. (1989). Regression analysis of multivariate incomplete failure time data by modeling marginal distribution. J. Am. Stat. Assoc. 84, 1065–1073.

    Article  Google Scholar 

  54. Strickler, H.D., Palefsky, J.M., Shah, K.V., Anastos, K., Klein, R.S., Minkoff, H., Duerr, A., Massad, L.S., Celentano, D.D., Hall, C., Fazzari, M., Cu-Uvin, S., Bacon, M., Schu-man, P, Levine, A.M., Durante, A.J., Gange, S., Melnick, S., Burk, R.D. (2003). Human papillomavirus 16 and immune status in human immunodeficiency virus-seropositive women. J. Natl. Cancer Inst. 95, 1062–71.

    Article  PubMed  Google Scholar 

  55. Strickler, H.D., Burk, R.D., Fazzari, M., Anastos, K., Minkoff, H., Massad, L.S., Hall, C., Bacon, M., Levine, A.M., Watts, H., Silverberg, M.J., Xue, X., Schlecht, N., Melnick, S., Palefsky, J.M. (2005). HPV Natural History and Possible HPV Reac tivation in HIV-Positive Women. J. Natl. Cancer Inst. 97, 577–86.

    Article  PubMed  Google Scholar 

  56. Lee, E., Wei, L., and Amato, D. (1992) Cox-Type Regression Analysis for Large Numbers of Small Groups of Correlated Failure Time Observations, Netherlands: Kluwer Academic Publishers, 237–247.

    Google Scholar 

  57. Andersen, P.K., and Gill, R.D. (1982). Cox's regression model counting process: a large sample study. Ann. Stat. 10, 1100–1120.

    Article  Google Scholar 

  58. Lin, D., Wei, L., Yang, I., and Ying, Z. (2000). Semiparametric regression for the mean and rate functions of recurrent events. J.R. Stat. Soc. B 62, 711–730.

    Article  Google Scholar 

  59. Lawless, J., and Nadeau, C. (1995) Some sim ple robust methods for the analysis of recur rent events. Technometrics 37, 158–168.

    Article  Google Scholar 

  60. Pepe, M., and Cai, J. (1993) Some graphi cal displays and marginal regression analyses for recurrent failure times and time dependent covariates. J. Am. Stat. Assoc. 88, 881–820.

    Article  Google Scholar 

  61. Prentice, R.L., Williams, B.J., and Peterson, A.V. (1981). On the regression analysis of multivariate failure time data. Biometrika 68, 373–379.

    Article  Google Scholar 

  62. Liang, K.Y., and Zeger, S.L. (1986) Longi tudinal data analysis using generalized linear models Biometrika 73, 13–22.

    Article  Google Scholar 

  63. Lipsitz, S.H., Kim, K., and Zhao, L. (1994) Analysis of repeated categorical data using generalized estimating equations. Stat. Med. 13, 1149–1163.

    Article  CAS  PubMed  Google Scholar 

  64. Miller, M.E., Davis, C.S., and Landis, J.R. (1993) The analysis of longitudinal poly-tomous data: generalized estimating equa tions and connections with weighted least squares. Biometrics 49, 1033–1044.

    Article  CAS  PubMed  Google Scholar 

  65. Zeger, S.L., Liang, K.-Y., and Albert, P.S. (1988) Models for longitudinal data: a gen eralized estimation equation approach. Bio metrics 44, 1049–1060.

    CAS  Google Scholar 

  66. Diggle, P.J., Liang, K.Y., and Zeger, S.L. (1994) Analysis of Longitudinal Data, Oxford: Clarendon Press.

    Google Scholar 

  67. Goldfarb, N. (1960) An Introduction to Longitudinal Statistical Analysis-the Method of Repeated Observations from a Fixed Sample, Glencoe, IL: Free Press.

    Google Scholar 

  68. Hoover, D.R. (2002) Power for t-test com parisons of unbalanced cluster exposure studies J Urban Health 79 (2), 278–94.

    PubMed  Google Scholar 

  69. Pan, W. (2001). Sample size and power cal culations with correlated binary data. Con trolled Clin. Trials 22, 211–227.

    Article  CAS  Google Scholar 

  70. Kupper, L.L., McMichael, A.J., and Spirtas, R. (1975) A hybrid epidemiologic study design useful in estimating relative risk. J. Am. Stat. Assoc. 351, 524–528.

    Article  Google Scholar 

  71. Breslow, N.E., Lubin, J.H., Marek, P., and Langholz, B. (1983) Multiplicative models and cohort analysis. J. Am. Stat. Assoc. 78, 1–12.

    Article  Google Scholar 

  72. Ernster, V.L. (1994) Nested case-control studies. Prev. Med. 23, 587–590.

    Article  CAS  PubMed  Google Scholar 

  73. Essebag, V., Genest J., Suissa S., and Pilote L. (2003). The nested case-control study in cardiology. Am. Heart J. 146, 581–590.

    Article  PubMed  Google Scholar 

  74. Sidney, S., Friedman, G.D., and Hiatt R.A. (1986). Serum cholesterol and large bowel cancer. Am. J. Epidemiol. 124, 33–38.

    CAS  PubMed  Google Scholar 

  75. Krieger, N., Wolff, M.S., Hiatt, R.A., Riv era, M., Vogelman, J., and Orentreich, N. (1994) Breast cancer and serum organochlo rines. J. Natl. Cancer Inst. 86, 589–599.

    Article  CAS  PubMed  Google Scholar 

  76. Langholz, B., and Clayton, D. (1994). Sam pling strategies in nested case-control stud ies. Environ. Health Perspect. 102 (Suppl 8), 46–51.

    Google Scholar 

  77. Steenland, K., Deddens, J.A. (1997) Increased precision using counter-matching in nested case-control studies. Epidemiology 8, 238–42.

    Article  CAS  PubMed  Google Scholar 

  78. Langholz, B. (2005) Counter-matching. Encyclopedia of Biostatistics. 2nd edition. Vol.2. ed. P. Armitage and T. Colton, Chichester, UK: John Wiley & Sons, Ltd., 1248–1254.

    Google Scholar 

  79. Cologne, J.B., Sharp, G.B., Neriishi, K., Verkasalo, P.K., Land, C.E. and Nakachi, K. (2004). Improving the efficiency of nested case-control studies of interaction by select ing controls using counter matching on exposure. Int. J. Epidemiol. 33, 485–492.

    Article  PubMed  Google Scholar 

  80. Andrieu, N., Goldstein, A.M., Thomas, D.C., and Langholz, B. (2001) Counter-matching in studies of gene-environment interaction: efficiency and feasibility. Am. J. Epidemiol. 153, 265–274.

    Article  CAS  PubMed  Google Scholar 

  81. Prentice, R. (1986). A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika 73, 1–11.

    Article  Google Scholar 

  82. Chen, K. (2001). Generalized case-cohort sampling. J. R. Stat. Soc. B 63, 791–809.

    Article  Google Scholar 

  83. Self, S.G., and Prentice, R. (1988). Asymp totic distribution theory and efficiency results for case-cohort studies. Ann. Stat. 16, 64–81.

    Article  Google Scholar 

  84. Barlow, W.E. (1994) Robust variance esti mation for the case-cohort design. Biometrics 50, 1064–1072.

    Article  CAS  PubMed  Google Scholar 

  85. Therneau, T.M., and Li, H. (1999) Com puting the Cox model for case-cohort designs. Lifetime Data Anal. 5, 99–112.

    Article  CAS  PubMed  Google Scholar 

  86. Ramadhani, M.K., Elias, S.G., van Noord, P.A.H., et al. (2005) Innate left handedness and risk of breast cancer: case-cohort study. BMJ 331, 882–883.

    Article  PubMed  Google Scholar 

  87. Savitz, D.A., Cai, J, van Wijngaarden, E., et al. (2000) Case-cohort analysis of brain cancer and leukemia in electric utility workers using a refined magnetic field job-exposure matrix. Am. J. Ind. Med. 38, 417–425.

    Article  CAS  PubMed  Google Scholar 

  88. Zeka, A., Eisen, E.A., Kriebel, D, et al. (2004). Risk of upper aerodigestive tract cancers in a case-cohort study of autowork ers exposed to metalworking fluids. Occup. Environ. Med. 61, 426–431.

    Article  CAS  PubMed  Google Scholar 

  89. Cai, J., and Zeng, D. (2004) Sample size/ power calculation of case-cohort studies. Biometrics 60, 1015–1024.

    Article  PubMed  Google Scholar 

  90. Kim, M.Y., Xue, X., and Du. Y. (2006) Approaches for calculating power for case-cohort studies. Biometrics 62, 929–933.

    Article  PubMed  Google Scholar 

  91. Wacholder, S. (1991) Practical considera tions in choosing between the case-cohort and nested case-control designs. Epidemiol ogy 2, 155–158.

    CAS  Google Scholar 

  92. Barlow, W.E., Ichikawa L., Rosner, D., and Izumi S. (1999) Analysis of case-cohort designs. J. Clin. Epidemiol. 52, 1165–1172.

    Article  CAS  PubMed  Google Scholar 

  93. Langholz, B., and Thomas, D.C. (1990) Nested case-control and case-cohort methods of sampling from a cohort: a critical compari son. Am. J. Epidemiol. 131, 169–176.

    CAS  PubMed  Google Scholar 

  94. Langholz, B., and Thomas, D.C. (1991) Effi ciency of cohort sampling designs: some sur prising results. Biometrics 47, 1563–1571.

    Article  CAS  PubMed  Google Scholar 

  95. Matanoski, G.M., and Tao, X. (2003) Sty rene exposure and ischemic heart disease: a case-cohort study. Am. J. Epidemiol. 158, 988–995.

    Article  PubMed  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer Science+Business Media, LLC

About this protocol

Cite this protocol

Xue, X., Hoover, D.R. (2009). Statistical Methods in Cancer Epidemiological Studies. In: Verma, M. (eds) Cancer Epidemiology. Methods in Molecular Biology, vol 471. Humana Press. https://doi.org/10.1007/978-1-59745-416-2_13

Download citation

  • DOI: https://doi.org/10.1007/978-1-59745-416-2_13

  • Publisher Name: Humana Press

  • Print ISBN: 978-1-58829-987-1

  • Online ISBN: 978-1-59745-416-2

  • eBook Packages: Springer Protocols

Publish with us

Policies and ethics