Avoiding bias due to perfect prediction in multiple imputation of incomplete categorical variables

Comput Stat Data Anal. 2010 Oct 1;54(10):2267-2275. doi: 10.1016/j.csda.2010.04.005.

Abstract

Multiple imputation is a popular way to handle missing data. Automated procedures are widely available in standard software. However, such automated procedures may hide many assumptions and possible difficulties from the view of the data analyst. Imputation procedures such as monotone imputation and imputation by chained equations often involve the fitting of a regression model for a categorical outcome. If perfect prediction occurs in such a model, then automated procedures may give severely biased results. This is a problem in some standard software, but it may be avoided by bootstrap methods, penalised regression methods, or a new augmentation procedure.

Keywords: Missing data; Multiple imputation; Perfect prediction; Separation.