Multivariate Imputation by Chained Equations (MICE) is commonly used to impute
missing values in analysis datasets using full conditional specifications. However,
it requires that the predictor models are specified correctly, including interactions
and nonlinearities. Random Forest is a regression and classification method which can
accommodate interactions and non-linearities without requiring a particular
statistical model to be specified.
The mice package provides the mice.impute.rf function for imputation using Random Forest, as of version 2.20. The CALIBERrfimpute package provides different, independently developed imputation functions using Random Forest in MICE.
This package contains reports of two simulation studies:
Simulation study is a comparison of Random Forest and parametric MICE in a linear regression example.
Vignette for survival analysis with interactions compares the Random Forest MICE algorithm for continuous variables (mice.impute.rfcont) with parametric MICE and the algorithm of Doove et al. in the mice package (mice.impute.cart and mice.impute.rf).
Details
Package:
CALIBERrfimpute
Type:
Package
Version:
0.1-6
Date:
2014-04-28
License:
GPL-3
Author(s)
Anoop Shah
Maintainer: anoop@doctors.org.uk
References
Shah AD, Bartlett JW, Carpenter J, Nicholas O, Hemingway H. Comparison of Random Forest and parametric imputation models for imputing missing data using MICE: a CALIBER study. American Journal of Epidemiology 2014. doi: 10.1093/aje/kwt312
Doove LL, van Buuren S, Dusseldorp E. Recursive partitioning for missing data imputation in the presence of interaction effects. Computational Statistics and Data Analysis 2014;72:92–104. doi: 10.1016/j.csda.2013.10.025