Last data update: 2014.03.03

R: Imputation by fast predictive mean matching
mice.impute.fastpmmR Documentation

Imputation by fast predictive mean matching

Description

Imputes univariate missing data using fast predictive mean matching

Usage

mice.impute.fastpmm(y, ry, x, donors = 5, type = 1, ridge = 1e-05,
  version = "", ...)

Arguments

y

Numeric vector with incomplete data

ry

Response pattern of y (TRUE=observed, FALSE=missing)

x

Design matrix with length(y) rows and p columns containing complete covariates.

donors

The size of the donor pool among which a draw is made. The default is donors = 5. Setting donors = 1 always selects the closest match. Values between 3 and 10 provide the best results. Note: The default was changed from 3 to 5 in version 2.19, based on simulation work by Tim Morris.

type

Type of matching distance. The default choice type = 1 calculates the distance between the predicted value of yobs and the drawn values of ymis. Other choices are type = 0 (distance between predicted values) and type = 2 (distance between drawn values). The current version supports only type = 1.

ridge

The ridge penalty applied in .norm.draw() to prevent problems with multicollinearity. The default is ridge = 1e-05, which means that 0.01 percent of the diagonal is added to the cross-product. Larger ridges may result in more biased estimates. For highly noisy data (e.g. many junk variables), set ridge = 1e-06 or even lower to reduce bias. For highly collinear data, set ridge = 1e-04 or higher.

version

A character variable indicating the version. Currently unused.

...

Other named arguments.

Details

Imputation of y by predictive mean matching, based on Rubin (1987, p. 168, formulas a and b). The procedure is as follows:

  1. Estimate beta and sigma by linear regression

  2. Draw beta* and sigma* from the proper posterior

  3. Compute predicted values for yobs beta and ymis beta*

  4. For each ymis, find donors observations with closest predicted values, randomly sample one of these, and take its observed value in y as the imputation.

  5. Ties are broken by making a random draw among ties. Note: The matching is done on predicted y, NOT on observed y.

Value

Numeric vector of length sum(!ry) with imputations

Note

The mice.impute.fastpmm() function is an experimental version of the standard mice.impute.pmm() function. In mice 2.22 both are equivalent. In future versions of mice the mice.impute.fastpmm() function may be subject to additional optimizations. This is an experimental feature.

Author(s)

Stef van Buuren, Karin Groothuis-Oudshoorn, 2000, 2012

References

Little, R.J.A. (1988), Missing data adjustments in large surveys (with discussion), Journal of Business Economics and Statistics, 6, 287–301.

Rubin, D.B. (1987). Multiple imputation for nonresponse in surveys. New York: Wiley.

Van Buuren, S., Brand, J.P.L., Groothuis-Oudshoorn C.G.M., Rubin, D.B. (2006) Fully conditional specification in multivariate imputation. Journal of Statistical Computation and Simulation, 76, 12, 1049–1064.

Van Buuren, S., Groothuis-Oudshoorn, K. (2011). mice: Multivariate Imputation by Chained Equations in R. Journal of Statistical Software, 45(3), 1-67. http://www.jstatsoft.org/v45/i03/

See Also

mice.impute.pmm

Results