R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Imputation by fast predictive mean matching

mice.impute.fastpmm

R Documentation

Imputation by fast predictive mean matching

Description

Imputes univariate missing data using fast predictive mean matching

Usage

mice.impute.fastpmm(y, ry, x, donors = 5, type = 1, ridge = 1e-05,
  version = "", ...)

Arguments

`y`	Numeric vector with incomplete data
`ry`	Response pattern of `y` (`TRUE`=observed, `FALSE`=missing)
`x`	Design matrix with `length(y)` rows and `p` columns containing complete covariates.
`donors`	The size of the donor pool among which a draw is made. The default is `donors = 5`. Setting `donors = 1` always selects the closest match. Values between 3 and 10 provide the best results. Note: The default was changed from 3 to 5 in version 2.19, based on simulation work by Tim Morris.
`type`	Type of matching distance. The default choice `type = 1` calculates the distance between the predicted value of `yobs` and the drawn values of `ymis`. Other choices are `type = 0` (distance between predicted values) and `type = 2` (distance between drawn values). The current version supports only `type = 1`.
`ridge`	The ridge penalty applied in `.norm.draw()` to prevent problems with multicollinearity. The default is `ridge = 1e-05`, which means that 0.01 percent of the diagonal is added to the cross-product. Larger ridges may result in more biased estimates. For highly noisy data (e.g. many junk variables), set `ridge = 1e-06` or even lower to reduce bias. For highly collinear data, set `ridge = 1e-04` or higher.
`version`	A character variable indicating the version. Currently unused.
`...`	Other named arguments.

Details

Imputation of y by predictive mean matching, based on Rubin (1987, p. 168, formulas a and b). The procedure is as follows:

Estimate beta and sigma by linear regression
Draw beta* and sigma* from the proper posterior
Compute predicted values for yobs beta and ymis beta*
For each ymis, find donors observations with closest predicted values, randomly sample one of these, and take its observed value in y as the imputation.
Ties are broken by making a random draw among ties. Note: The matching is done on predicted y, NOT on observed y.

Value

Numeric vector of length sum(!ry) with imputations

Note

The mice.impute.fastpmm() function is an experimental version of the standard mice.impute.pmm() function. In mice 2.22 both are equivalent. In future versions of mice the mice.impute.fastpmm() function may be subject to additional optimizations. This is an experimental feature.

Author(s)

Stef van Buuren, Karin Groothuis-Oudshoorn, 2000, 2012

References

Little, R.J.A. (1988), Missing data adjustments in large surveys (with discussion), Journal of Business Economics and Statistics, 6, 287–301.

Rubin, D.B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.

Van Buuren, S., Brand, J.P.L., Groothuis-Oudshoorn C.G.M., Rubin, D.B. (2006) Fully conditional specification in multivariate imputation. Journal of Statistical Computation and Simulation, 76, 12, 1049–1064.

Van Buuren, S., Groothuis-Oudshoorn, K. (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), 1-67. http://www.jstatsoft.org/v45/i03/