R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Impute variables from references to targets

impute.yai

R Documentation

Impute variables from references to targets

Description

Imputes the observation for variables from a reference observation to a target observation. Also, imputes a value for a reference from other references. This practice is useful for validation (see yai). Variables not available in the original data may be imputed using argument ancillaryData.

Usage

## S3 method for class 'yai'
impute(object,ancillaryData=NULL,method="closest",
       method.factor=method,k=NULL,vars=NULL,
       observed=TRUE,...)

Arguments

`object`	an object of class `yai`.
`ancillaryData`	a data frame of variables that may not have been used in the original call to `yai`. There must be one row for each reference observation, no missing data, and row names must match those used in the reference observations.
`method`	the method used to compute the imputed values for continuous variables, as follows: `closest`: use the single neighbor that is closest (this is the default and is always used when k=1); `mean`: the mean of the k neighbors is taken; `median`: the median of the k neighbors is taken; `dstWeighted`: a weighted mean is taken over the k neighbors where the weights are 1/(1+d).
`method.factor`	the method used to compute the imputed values for factors, as follows: `closest`: use the single neighbor that is closest (this is the default and is always used when k=1); `mean or median`: actually is the mode--it is the factor level that occurs the most often among the k neighbors; `dstWeighted`: a mode where the count is the sum of the weights (1/(1+d)) rather than each having a weight of 1.
`k`	the number neighbors to use in averages, when NULL all present are used.
`vars`	a character vector of variables to impute, when NULL, the behaviour depends on the value of `ancillaryData`: when it is NULL, the Y-variables are imputed and otherwise all present in `ancillaryData` are imputed.
`observed`	when TRUE, columns are created for observed values (those from the target observations) as well as imputed values (those from the reference observations.
`...`	passed to other methods, currently not used.

Value

An object of class c("impute.yai","data.frame"), with rownames identifying observations and column names identifying variables. When observed=TRUE additional columns are created with a suffix of .o.

NA's fill columns of observed values when no corresponding value is known, as in the case for Y-variables from target observations.

Scale factors for each variable are returned as an attribute (see attributes).

Author(s)

Nicholas L. Crookston ncrookston.fs@gmail.com
Andrew O. Finley finleya@msu.edu
Emilie Henderson emilie.henderson@oregonstate.edu

Examples

require(yaImpute)

data(iris)

# form some test data
refs=sample(rownames(iris),50)
x <- iris[,1:3]      # Sepal.Length Sepal.Width Petal.Length
y <- iris[refs,4:5]  # Petal.Width Species

# build a yai object using mahalanobis
mal <- yai(x=x,y=y,method="mahalanobis")

# output a data frame of observed and imputed values
# of all variables and observations.

impute(mal)
malImp=impute(mal,ancillaryData=iris)
plot(malImp)