Imputes univariate missing data using fast predictive mean matching
Usage
mice.impute.fastpmm(y, ry, x, donors = 5, type = 1, ridge = 1e-05,
version = "", ...)
Arguments
y
Numeric vector with incomplete data
ry
Response pattern of y (TRUE=observed,
FALSE=missing)
x
Design matrix with length(y) rows and p columns
containing complete covariates.
donors
The size of the donor pool among which a draw is made. The default is
donors = 5. Setting donors = 1 always selects the closest match. Values
between 3 and 10 provide the best results. Note: The default was changed from
3 to 5 in version 2.19, based on simulation work by Tim Morris.
type
Type of matching distance. The default choice type = 1 calculates the distance between the predicted value of yobs and the drawn values of ymis. Other choices are type = 0 (distance between predicted values) and type = 2 (distance between drawn values). The current version supports only type = 1.
ridge
The ridge penalty applied in .norm.draw() to prevent problems with multicollinearity. The default is ridge = 1e-05, which means that 0.01 percent of the diagonal is added to the cross-product. Larger ridges may result in more biased estimates. For highly noisy data (e.g. many junk variables), set ridge = 1e-06 or even lower to reduce bias. For highly collinear data, set ridge = 1e-04 or higher.
version
A character variable indicating the version. Currently unused.
...
Other named arguments.
Details
Imputation of y by predictive mean matching, based on Rubin (1987, p.
168, formulas a and b). The procedure is as follows:
Estimate beta and sigma by linear regression
Draw beta* and sigma* from
the proper posterior
Compute predicted values for yobsbeta and
ymisbeta*
For each ymis, find donors observations with
closest predicted values, randomly sample one of these,
and take its observed value in y as the imputation.
Ties are broken by making a random draw
among ties.
Note: The matching is done on predicted y, NOT on
observed y.
Value
Numeric vector of length sum(!ry) with imputations
Note
The mice.impute.fastpmm() function is an experimental
version of the standard mice.impute.pmm() function.
In mice 2.22 both are equivalent. In future versions of
mice the mice.impute.fastpmm() function may be
subject to additional optimizations. This is an experimental feature.
Author(s)
Stef van Buuren, Karin Groothuis-Oudshoorn, 2000, 2012
References
Little, R.J.A. (1988), Missing data adjustments in large surveys
(with discussion), Journal of Business Economics and Statistics, 6, 287–301.
Rubin, D.B. (1987). Multiple imputation for nonresponse in surveys. New York:
Wiley.
Van Buuren, S., Brand, J.P.L., Groothuis-Oudshoorn C.G.M., Rubin, D.B. (2006)
Fully conditional specification in multivariate imputation. Journal of
Statistical Computation and Simulation, 76, 12, 1049–1064.
Van Buuren, S., Groothuis-Oudshoorn, K. (2011). mice: Multivariate
Imputation by Chained Equations in R. Journal of Statistical
Software, 45(3), 1-67. http://www.jstatsoft.org/v45/i03/