R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Affinity Calculation

affinity

R Documentation

Affinity Calculation

Description

Calculates affinity based on Cranmer and Gill (2013). The function performs the original method (as described in the article) and also a method that takes into account the correlation structure of the observed data that increases efficiency in making matches.

Usage

affinity(data, index, column = NULL, R = NULL, weighted = FALSE)

Arguments

`data`	A data frame or matrix of values for which affinity should be calculated
`index`	A row number identifying the target observation. Affinity will be calculated between this observation and all others in the dataset.
`column`	A column number identifying the variable with missing information. This is only needed for the optional correlation-weighted affinity score. The correlation that is used is the correlation of all variables with the focus variable (i.e., the column).
`R`	A correlation matrix for `data`.
`weighted`	Logical indicating whether or not the correlation-weighted affinity measure should be used.

Details

Affinity is calculated by first identifying whether two observations are sufficiently ‘close’ on each variable. Consider the target observation number 1. If observation i is close to the target observation on variable j, then A[i,j] = 1 otherwise, it equals zero. Close for two discrete variables is defined by them taking on the same value. Close for continuous variables is taking on a distance no greater than 1 from each other. While this may seem restrictive and arbitrary, arguments exist in the main package function hot.deck that allows the user to set how many standard deviations equal a distance of 1 (with the cutoffSD argument).

Value

A number of missing observation-variable combinations-by-number of observations in data matrix of affinity scores.

Author(s)

Skyler Cranmer, Jeff Gill, Natalie Jackson, Andreas Murr and Dave Armstrong

References

Cranmer, S.J. and Gill, J.M.. (2013) “We Have to Be Discrete About This: A Non-Parametric Imputation Technique for Missing Categorical Data.” British Journal of Political Science 43:2 (425-449).

Examples

data(D)
out <- hot.deck(D)