Last data update: 2014.03.03

R: Copula-Based Imputation Method
CoImpR Documentation

Copula-Based Imputation Method

Description

Imputation method based on conditional copula functions.

Usage

CoImp(X, n.marg = 2, type.data = "continuous", smoothing = rep(0.5,n.marg),
      plot.marg = TRUE, plot.bar = TRUE, plot.legend = TRUE, args.legend =
      list(y = 110, cex = 0.8), model = list(normalCopula(0.5, dim=n.marg,
      dispstr="ex"), claytonCopula(10, dim=n.marg),gumbelCopula(10, dim=n.marg),
      frankCopula(10, dim=n.marg)), ...)

Arguments

X

a data matrix with missing values. Missing values should be denoted with NA.

n.marg

the number of variables in X.

type.data

the nature of the variables in X: discrete or continuous.

smoothing

values for the nearest neighbour component of the smoothing parameter of the lp function.

plot.marg

logical: if TRUE plots the estimated marginal densities.

plot.bar

logical: if TRUE shows a bar plot of the percentages of missing and available data for each margin.

plot.legend

logical: see barplot.

args.legend

list of additional arguments to pass to legend.

model

a list of copula models to be used for the imputation, see the Details section. This should be one of normal, frank, clayton and gumbel.

...

further parameters for fitCopula, lp and further graphical arguments.

Details

CoImp is an imputation method based on conditional copula functions that allows to impute missing observations according to the multivariate dependence structure of the generating process without any assumptions on the margins. This method can be used independently from the dimension and the kind (monotone or non monotone) of the missing patterns.

Brief description of the approach:

  1. estimate both the margins and the copula model on available data by means of the semi-parametric sequential two-step inference for margins;

  2. derive conditional density functions of the missing variables given non-missing ones through the corresponding conditional copulas obtained by using the Bayes' rule;

  3. impute missing values by drawing observations from the conditional density functions derived at the previous step. The Monte Carlo method used is the Hit or Miss.

The estimation approach for the copula fit is semiparametric: a range of nonparametric margins and parametric copula models can be selected by the user.

Value

An object of S4 class "CoImp", which is a list with the following elements:

Missing.data.matrix

the original missing data matrix to be imputed.

Perc.miss

the matrix of the percentage of missing and available data.

Estimated.Model

the estimated copula model on the available data.

Estimation.Method

the estimation method used for the copula Estimated.Model.

Index.matrix.NA

matrix indices of the missing data.

Smooth.param

the smoothing parameter alpha selected on the basis of the AIC.

Imputed.data.matrix

the imputed data matrix.

Estimated.Model.Imp

the estimated copula model on the imputed data matrix.

Estimation.Method.Imp

the estimation method used for the copula Estimated.Model.Imp.

Author(s)

Francesca Marta Lilja Di Lascio <marta.dilascio@unibz.it>,

Simone Giannerini <simone.giannerini@unibo.it>

References

Di Lascio, F.M.L. Giannerini, S. and Reale A. (201x) "A multivariate technique based on conditional copula specification for the imputation of complex dependent data". Working paper.

Di Lascio, F.M.L. Giannerini, S. and Reale A. (201x) "Exploring Copulas for the Imputation of Complex Dependent Data". Under review.

Bianchi, G. Di Lascio, F.M.L. Giannerini, S. Manzari, A. Reale, A. and Ruocco, G. (2009) "Exploring copulas for the imputation of missing nonlinearly dependent data". Proceedings of the VII Meeting Classification and Data Analysis Group of the Italian Statistical Society (Cladag), Editors: Salvatore Ingrassia and Roberto Rocci, Cleup, p. 429-432. ISBN: 978-88-6129-406-6.

Examples


# generate data from a 4-variate Gumbel copula with different margins

set.seed(11)
n.marg <- 4
theta  <- 5
copula <- frankCopula(theta, dim = n.marg)
mymvdc <- mvdc(copula, c("norm", "gamma", "beta","gamma"), list(list(mean=7, sd=2),
list(shape=3, rate=2), list(shape1=4, shape2=1), list(shape=4, rate=3)))
n      <- 20
x.samp <- copula::rMvdc(n, mymvdc)

# randomly introduce univariate and multivariate missing

perc.mis    <- 0.5
set.seed(11)
miss.row    <- sample(1:n, perc.mis*n, replace=TRUE)
miss.col    <- sample(1:n.marg, perc.mis*n, replace=TRUE)
miss        <- cbind(miss.row,miss.col)
x.samp.miss <- replace(x.samp,miss,NA)

# impute missing values

imp <- CoImp(x.samp.miss, n.marg=n.marg, smoothing = rep(0.6,n.marg), TRUE, TRUE, TRUE,
            type.data="continuous");

# methods show and plot

show(imp)
plot(imp)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(CoImp)
Loading required package: copula
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/CoImp/CoImp.Rd_%03d_medium.png", width=480, height=480)
> ### Name: CoImp
> ### Title: Copula-Based Imputation Method
> ### Aliases: CoImp
> ### Keywords: imputation copula multivariate
> 
> ### ** Examples
> 
> 
> # generate data from a 4-variate Gumbel copula with different margins
> 
> set.seed(11)
> n.marg <- 4
> theta  <- 5
> copula <- frankCopula(theta, dim = n.marg)
> mymvdc <- mvdc(copula, c("norm", "gamma", "beta","gamma"), list(list(mean=7, sd=2),
+ list(shape=3, rate=2), list(shape1=4, shape2=1), list(shape=4, rate=3)))
> n      <- 20
> x.samp <- copula::rMvdc(n, mymvdc)
> 
> # randomly introduce univariate and multivariate missing
> 
> perc.mis    <- 0.5
> set.seed(11)
> miss.row    <- sample(1:n, perc.mis*n, replace=TRUE)
> miss.col    <- sample(1:n.marg, perc.mis*n, replace=TRUE)
> miss        <- cbind(miss.row,miss.col)
> x.samp.miss <- replace(x.samp,miss,NA)
> 
> # impute missing values
> 
> imp <- CoImp(x.samp.miss, n.marg=n.marg, smoothing = rep(0.6,n.marg), TRUE, TRUE, TRUE,
+             type.data="continuous");
  Number of imputed rows:  1 
  Number of imputed rows:  2 
  Number of imputed rows:  3 
  Number of imputed rows:  4 
  Number of imputed rows:  5 
  Number of imputed rows:  6 
  Number of imputed rows:  7 
dev.new(): using pdf(file="Rplots103.pdf")
> 
> # methods show and plot
> 
> show(imp)
 Main output of the function CoImp 
 -------------------------------------------------------------------------- 
 Percentage of missing and available data : 
        X1 X2 X3 X4
Data    90 80 90 90
Missing 10 20 10 10
 -------------------------------------------------------------------------- 
 Imputed data matrix : 
             X1        X2        X3        X4
 [1,]  3.828826 0.8822797 0.3191040 1.2047541
 [2,]  3.671844 1.1212306 0.7395928 0.8032523
 [3,] 10.182096 1.3092018 0.9862697 2.4261920
 [4,]  7.542160 1.4730215 0.7916568 1.1890841
 [5,]  4.178310 0.9091439 0.7581093 0.4425957
 [6,]  5.077258 1.1280285 0.8627057 1.1910965
 [7,]  7.180951 0.7985699 0.7442271 1.3398152
 [8,]  4.777585 1.2025231 0.8475026 1.3549778
 [9,]  6.366217 0.7956087 0.8703417 1.0613085
[10,]  8.915754 1.8073971 0.8451348 1.1661280
[11,]  8.485119 2.3337285 0.9200857 1.0925793
[12,]  6.926006 1.1336687 0.8795206 0.7957968
[13,]  7.894479 1.7332545 0.7935383 1.5408593
[14,]  4.110186 0.3929393 0.5983372 0.9797315
[15,]  6.785904 2.6885123 0.9388732 1.5504342
[16,]  6.135088 1.1034097 0.6979510 0.8711947
[17,]  5.339481 0.6553712 0.7984800 1.4605662
[18,]  4.989604 1.0502501 0.7283188 1.0803008
[19,]  4.463594 0.7334527 0.7314590 0.8510987
[20,]  5.167510 0.4067687 0.7474457 0.4728116
 -------------------------------------------------------------------------- 
> plot(imp)
dev.new(): using pdf(file="Rplots104.pdf")
> 
> 
> 
> 
> 
> 
> dev.off()
png 
  2 
>