Introduction

Since demonstration of the efficacy of mammographic screening in randomised trials, many developed countries have initiated mammography screening programmes [1]. It has been observed, both in the trials and the screening programmes, that survival is much better in women with screen-detected cancers than in those with tumours detected symptomatically [24]. The survival of women with screen-detected breast cancers, however, is known to be inflated by both lead time and length bias [5]. A review [6] and several studies [7, 8] have sought, and discussed, methods that quantify the impact of mammography screening at a population level, but none have provided bias-free or bias-corrected estimates of survival differences between women with screen-detected and symptomatic breast cancer. Bias-corrected estimates of the impact of screen-detection would be of considerable relevance to clinical and public health practice, and to women, and would support sustaining investment in population breast screening programmes.

Lead time is the advance in the time of diagnosis as a result of screening: if screening detects a tumour 3 years, say, before it would have given rise to symptomatic detection, the survival time is increased by three years regardless of whether the date of death is changed as a result of earlier diagnosis. Length bias is the tendency of screening to detect a greater proportion of slower-growing, less aggressive cancers: if a tumour is developing slowly, it is likely to have a longer pre-symptomatic screen detectable period and is therefore more susceptible to detection by screening.

We report a cohort study of all breast cancers diagnosed within the West Midlands, UK, since the inception of the National Breast Screening Programme in 1988, and estimate survival in both screen-detected and symptomatic (including interval) cases. We aimed to evaluate the extent to which lead time and length bias account for the better observed survival in women with screen detected breast cancer, and also examined the effect of self-selection bias.

Methods

In collaboration with the National Health Service (NHS) Breast Screening Programme, the West Midlands Cancer Intelligence Unit (WMCIU) has collected pathological, screening history, and follow-up data on all breast cancers diagnosed in women aged 50–69 in the West Midlands UK, from 1988 to 2001 and also in women aged 50–74 years from 2002 to 2004 (reflecting the increased upper age limit for screening in the UK since 2002). Our study cohort comprises the 26,766 women with a first diagnosis of breast cancer who were notified to the WMCIU during this study period: 10,100 women with screen-detected breast cancer, 6,009 women with interval breast cancer, and 9,853 women with symptomatic tumours detected outside of the screening programme (that is to say, women diagnosed either before their first invitation to screening, after withdrawal from the screening programme or after failing to attend their most recent scheduled screen). For brevity, these will be referred to as ‘unexposed’.

Screening status was ascertained by reference to the NHS breast screening programme database; 804 (3%) women were excluded because their screening history could not be ascertained. All notified cancers were flagged with the Office for National Statistics (ONS), and deaths up to December 31st 2006 notified to the WMCIU. The transfer of anonymised data from the WMCIU to the Cancer Research UK Centre of Epidemiology, Mathematics and Statistics is covered by the UK Association of Cancer Registries Guidelines on the Release of Identifiable and Non-identifiable Data.

In the UK, the National Breast Screening Programme actively invited women aged 50–64 to three-yearly mammographic screening, and provided screening on request to women aged 65 and over. From 2002 to 2004, the upper age range for invitation was gradually extended to 70 years, and women above this age remain eligible for screening on request. The earliest screening was by single view-mammography but two-view has been used for first screens since the mid 1990’s and for all screens since 2003.

Statistical analyses

Our endpoint was survival to death from breast cancer. Survival was estimated by the Kaplan–Meier method [9], and both relative risks, which are the ratios of the cumulative probabilities of dying from the disease within a specified follow-up time [10], and Cox regression relative hazards, the ratios of the rates of fatality at individual points in time [9], were calculated. We adjusted for lead time and length bias using the methods of Duffy et al. [11]. Briefly, the additional follow-up due to lead time was estimated individually for each screen-detected case, based on the observed follow-up time since diagnosis, the estimated 4-year average preclinical screen-detectable time estimated from the Swedish Two-County Trial data [12], and whether the patient had died of breast cancer or not. The additional follow-up time was then subtracted from the observed survival time.

For length bias, we hypothesised two latent tumour populations, one of which (the length bias group) was more likely to be screen-detected and in the same proportion less likely a priori to cause death regardless of screening. We then estimated the reduction in risk of dying from breast cancer for screen-detected tumours within each of the two populations, i.e. not confounded by length bias. Since we have no way of knowing what proportion of the tumour population is in the length bias group, nor the relative risk of screen detection (the inverse of the screening-independent relative risk of breast cancer death) pertaining to this group, we performed a series of sensitivity analyses over a number of plausible values of these to give a range of possible bias-corrected relative risks. We took the median of these as representing the best bias-corrected estimate.

We first compared screen-detected cases with all symptomatic cases, including both unexposed cases and interval cancers, then with only the interval cancers. This is because the unexposed cases contain a high proportion of non-attenders and lapsed attenders who have been observed to have a higher death rate than subjects not invited to screening, due to self-selection bias, so that their inclusion in the symptomatic group might bias the results in favour of screening [13]. We performed the analyses both including and excluding the 2,102 cases of carcinoma in situ.

Results

Descriptive data for this cohort are summarised in Table 1 according to detection status. Median age was 58 years in screen-detected cancers and interval cases, and 61 years in unexposed cancers, range 50–74 years. Table 2 shows the numbers of cases and deaths by detection mode for all cancers including carcinoma in situ. Screen-detected cases showed increased 10-year survival rates compared to symptomatic cases (88% versus 65%). The relative risk of breast cancer death was 0.12/0.35 = 0.34 (95% CI 0.31–0.37). The Cox regression relative hazard was 0.27 (95% CI 0.25–0.30). When corrected for lead time bias, survival in symptomatic cases was unchanged, by definition, but the corrected ten-year survival for the screen-detected cases was 83%. Figure 1 shows the survival of symptomatic and of screen-detected cases, with and without the correction for lead time. This resulted in a relative risk of 0.49 (95% CI 0.45–0.53) and a relative hazard of 0.40 (95% CI 0.37–0.44).

Table 1 Breast cancers (invasive and in situ) by detection mode and age-group
Table 2 Breast cancer cases and deaths by detection mode
Fig. 1
figure 1

Ten-year survival of symptomatic and screen detected cancers, both invasive and in situ, with the latter uncorrected and corrected for lead time

To determine the possible effects of length bias, a series of sensitivity analyses was performed, allowing the proportion of tumours in the length bias group to range from 10% to 50%, and the a priori relative risk of breast cancer death in the length bias group to range from 0.5 to 0.9 (Table 3). The bias-corrected relative risks of breast cancer death associated with screen-detection had a median of 0.51 with an absolute range of 0.49–0.59. The corrected relative hazards had a median of 0.45 and range of 0.43–0.51.

Table 3 Lead time and length-bias corrected estimates of the true relative risk for screen-detected vs symptomatic cases and relative hazard for a range of possible length bias parameters

To exclude self-selection bias, the survival analysis was repeated with the interval cancers as the symptomatic cases, thus excluding the unexposed cases. Before correction for lead time, the relative risk for screen-detected cancers compared to interval cancers was 0.44 (95% CI 0.40–0.49), and the relative hazard 0.36 (95% CI 0.33–0.39). Figure 2 shows the survival of screen-detected and interval cancers with the former corrected for lead time. The corrected 10-year survival of screen-detected cases was again 83% and the 10-year survival of interval cancers was 73%. This gave a relative risk of 0.63 (95% CI 0.57–0.69). The corresponding relative hazard was 0.53 (95% CI 0.49–0.59). Table 4 shows the corresponding relative risk and relative hazards adjusted additionally for length bias. The lead time and length bias corrected relative risks had a median of 0.68 and a range of 0.63–0.84. The corrected relative hazards had a median of 0.64 and a range of 0.59–0.82.

Fig. 2
figure 2

Survival of interval cancers and screen-detected cancers, invasive and in situ, with the latter corrected for lead time

Table 4 Lead time and length-bias corrected estimates of the true relative risk and relative hazard for screen-detected vs interval cancers and relative hazard for a range of possible length bias parameters

There were 1,567 (15.5%) cases of carcinoma in situ among the screen-detected cancers, and 535 (3.4%) among the symptomatic. When data were analysed excluding the in situ carcinoma cases, the uncorrected 10-year survival in the screen-detected cases was 86% and that in the symptomatic cases was 64%, a relative risk of 0.39 (0.36–0.42). The Cox regression relative hazard was 0.31 (0.28–0.33). Correcting for lead time, the 10-year survival in the screen-detected cases was 81%, giving a relative risk of 0.53 (0.49–0.57). Figure 3 shows the corresponding survival curves. The relative hazard was 0.45 (0.42–0.49). Sensitivity analysis for length bias using the range of values for q and θ in Tables 3 and 4 gave a range of values for the relative risk from 0.53 to 0.63, with a median of 0.55. The relative hazard ranged from 0.47 to 0.56, with a median of 0.49.

Fig. 3
figure 3

Ten-year survival of symptomatic and screen detected cancers, invasive only, with the latter uncorrected and corrected for lead time

When we consider the screen-detected and interval cancer invasive cases only, the uncorrected 10-year survival rates were respectively 86% and 72%, a relative risk of 0.50 (0.45–0.55). The relative hazard was 0.41 (0.37–0.45). After correction for lead time, the screen-detected cases had a ten-year survival rate of 81%, and the survival of the interval cancers was unchanged. The corrected relative risk was 0.68 (0.62–0.74) and the relative hazard 0.60 (0.55–0.65). Sensitivity analyses for length bias using the same values for q and θ as before gave a range of relative risks from 0.70 to 0.92, with a median of 0.75.

The survival results are summarised in Table 5, showing survival rates, relative risks and relative hazards, with and without bias corrections and separately for all tumours and for invasive tumours only.

Table 5 Summary of survival analysis results by tumour group (all or invasive only) and correction for bias

Discussion

Our study of 25,962 women with breast cancer showed a crude reduction of 66% (RR = 0.49) in the cumulative 10-year cause-specific fatality rate for screen-detected compared with symptomatic breast cancer. Correction for lead time bias, assuming an average preclinical screen-detectable period of four years, yielded a 51% reduction. Further correction for length bias yielded a range of estimates, with a median of a 49% reduction in 10-year fatality. Comparison of the screen-detected with interval cancers, i.e. using only screened subjects, to avoid the self selection bias whereby the non-attenders and lapsed attenders among the unexposed might artificially increase the fatality rate of the symptomatic cases gave a range of estimates with a median 32% reduction in fatality. Even taking the most pessimistic value in the range of plausible corrected results, and assuming that 50% of the cancers are twice as likely to be screen-detected and half as likely to cause death regardless of screening, would yield an 11% reduction in 10-year fatality in the invasive tumours.

Since the proportion of progressive carcinoma in situ cases remains uncertain, we analysed the data including and excluding the in situ cases. The exclusion of the in situ cases leads to a smaller estimated survival advantage in favour of screening, as one would expect. This is the case for both the corrected and uncorrected estimates. Overall, the estimate corrected for all three biases observed in cancer screening was a 32% reduction in fatality for all cases, invasive and in situ, and a 26% reduction for invasive cases only.

In addition to reporting bias-corrected estimates of the effect of screen-detection, our work provides population data in a well-defined cohort of all breast cancer cases in the West Midlands region, with ascertained screening history and outcomes, and including data on interval cancers. Interval cancers represent the cases who attend screening but whose cancers are not screen-detected and emerge clinically between scheduled screening episodes. Our data show that survival probabilities in this population of symptomatic breast cancers are intermediary to those of screen-detected and non-screened (unexposed) women.

The limitations of the method we have reported are the assumptions made in the length bias sensitivity analyses. The model is of only two discrete populations, whereas length bias may be a continuous phenomenon. The resulting corrections are plausible, however, and the method has the advantage of simplicity and therefore easy applicability by other researchers. Because the 95% confidence intervals after adjustment for lead time do not take into account the uncertainty in lead time estimates, these are likely to be anticonservative. However, with a large data set such as this, with only a minority of the survival times adjusted, potential underestimation of the confidence intervals is likely to be small.

The effect of length bias on estimates of screening benefit has been difficult to quantify in the past. Applying a sensitivity analysis for length bias [11], the lead time adjusted 51% reduction in fatality (based on all subjects) was slightly attenuated to an estimated 49%. Our results suggest that the effect of length bias in terms of artificially inflating the survival advantage of screen detection is likely to be smaller than that of other biases such as lead time and self-selection for screening.

The length bias method applied in this study can be adapted to model overdiagnosis, the most extreme form of length bias [11]. If we assume that 25% of screen-detected breast cancers are overdiagnosed, which is considerably higher than our formal estimates [1416], but is consistent with 10% overdiagnosis in the cohort as a whole (since 25% of 39% is approximately 10%) as observed in the Malmö randomised trial [17], the fatality reduction corrected for lead time and overdiagnosis was 34% [11].

Uncorrected comparisons of screen-detected with symptomatic cancers in other breast cancer populations yield similar results to our uncorrected analysis [24]. This suggests that our bias-corrected estimates are also generalisable. It would, be of interest to see the results of such bias corrections in other studies.

We have reported the reduction in risk of cause specific case fatality associated with screen detection of breast cancers, correcting for lead time, length bias and self-selection bias. Women aged 50–74 years with screen detected breast cancer had approximately half the 10-year case fatality relative to women with symptomatic breast cancer. Allowing additionally for self-selection bias, there was a 32% reduction in fatality associated with screen-detection. Irrespective of assumptions made about potential bias there was a substantial improvement in survival associated with screening. The detailed methods are published separately [11] and can be used to assess the impact of breast screening on survival in other populations.