Last data update: 2014.03.03

R: Performance measures for evaluating the goodness of an...
PerfMeasureR Documentation

Performance measures for evaluating the goodness of an imputed database

Description

Set of measures useful to evaluate the goodness of the used imputation method.

Usage

PerfMeasure(db.complete, db.imputed, db.missing, n.marg = 2, model =
list(normalCopula(0.5, dim=n.marg, dispstr="ex"), claytonCopula(10,
dim=n.marg), gumbelCopula(10, dim=n.marg), frankCopula(10, dim=n.marg)),
...)

Arguments

db.complete

the complete data matrix.

db.imputed

the imputed data matrix.

db.missing

the data matrix with NA data.

n.marg

the number of variables in db.complete.

model

a list of copula models to be used for the imputation. See the Details section. This should be one of normal, frank, clayton and gumbel.

...

further parameters for fitCopula.

Details

PerfMeasure computes some measures useful for evaluating the goodness of the used imputation method. PerfMeasure requires in input the imputed, the complete and the missing data matrix and gives in output five different measures of performance. See below for details

Value

An object of S4 class "PerfMeasure", which is a list with the following elements:

MARE

Object of class "numeric". The mean (on the replications performed) of the absolute relative error between the imputed and the corresponding original value.

RB

Object of class "numeric". The relative bias of the estimator for the dependence parameter.

RRMSE

Object of class "numeric". The relative root mean squared error of the estimator for the dependence parameter.

TID

Object of class "vector". Upper and lower tail dependence indexes for bivariate copulas. Original function is in tailIndex.

Author(s)

Francesca Marta Lilja Di Lascio <marta.dilascio@unibz.it>,

Simone Giannerini <simone.giannerini@unibo.it>

References

Di Lascio, F.M.L. Giannerini, S. and Reale A. (201x) "A multivariate technique based on conditional copula specification for the imputation of complex dependent data". Working paper.

Di Lascio, F.M.L. Giannerini, S. and Reale A. (201x) "Exploring Copulas for the Imputation of Complex Dependent Data". Under review.

Bianchi, G. Di Lascio, F.M.L. Giannerini, S. Manzari, A. Reale, A. and Ruocco, G. (2009) "Exploring copulas for the imputation of missing nonlinearly dependent data". Proceedings of the VII Meeting Classification and Data Analysis Group of the Italian Statistical Society (Cladag), Editors: Salvatore Ingrassia and Roberto Rocci, Cleup, p. 429-432. ISBN: 978-88-6129-406-6.

Examples


# generate data from a 4-variate Gumbel copula with different margins

set.seed(11)
n.marg <- 4
theta  <- 5
copula <- frankCopula(theta, dim = n.marg)
mymvdc <- mvdc(copula, c("norm", "gamma", "beta","gamma"), list(list(mean=7, sd=2),
 list(shape=3, rate=2), list(shape1=4, shape2=1), list(shape=4, rate=3)))
n      <- 20
x.samp <- rMvdc(n, mymvdc)

# randomly introduce univariate and multivariate missing

perc.mis    <- 0.5
set.seed(11)
miss.row    <- sample(1:n, perc.mis*n, replace=TRUE)
miss.col    <- sample(1:n.marg, perc.mis*n, replace=TRUE)
miss        <- cbind(miss.row,miss.col)
x.samp.miss <- replace(x.samp,miss,NA)

# impute missing values

imp <- CoImp(x.samp.miss, n.marg=n.marg, smoothing = rep(0.6,n.marg), TRUE, TRUE, TRUE,
            type.data="continuous");
imp

# apply PerfMeasure to the imputed data set

pm <- PerfMeasure(db.complete=x.samp, db.missing=x.samp.miss,
                  db.imputed=imp@"Imputed.data.matrix", n.marg=4)

pm

str(pm)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(CoImp)
Loading required package: copula
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/CoImp/PerfMeasure.Rd_%03d_medium.png", width=480, height=480)
> ### Name: PerfMeasure
> ### Title: Performance measures for evaluating the goodness of an imputed
> ###   database
> ### Aliases: PerfMeasure
> ### Keywords: imputation copula multivariate
> 
> ### ** Examples
> 
> 
> # generate data from a 4-variate Gumbel copula with different margins
> 
> set.seed(11)
> n.marg <- 4
> theta  <- 5
> copula <- frankCopula(theta, dim = n.marg)
> mymvdc <- mvdc(copula, c("norm", "gamma", "beta","gamma"), list(list(mean=7, sd=2),
+  list(shape=3, rate=2), list(shape1=4, shape2=1), list(shape=4, rate=3)))
> n      <- 20
> x.samp <- rMvdc(n, mymvdc)
> 
> # randomly introduce univariate and multivariate missing
> 
> perc.mis    <- 0.5
> set.seed(11)
> miss.row    <- sample(1:n, perc.mis*n, replace=TRUE)
> miss.col    <- sample(1:n.marg, perc.mis*n, replace=TRUE)
> miss        <- cbind(miss.row,miss.col)
> x.samp.miss <- replace(x.samp,miss,NA)
> 
> # impute missing values
> 
> imp <- CoImp(x.samp.miss, n.marg=n.marg, smoothing = rep(0.6,n.marg), TRUE, TRUE, TRUE,
+             type.data="continuous");
  Number of imputed rows:  1 
  Number of imputed rows:  2 
  Number of imputed rows:  3 
  Number of imputed rows:  4 
  Number of imputed rows:  5 
  Number of imputed rows:  6 
  Number of imputed rows:  7 
dev.new(): using pdf(file="Rplots107.pdf")
> imp
 Main output of the function CoImp 
 -------------------------------------------------------------------------- 
 Percentage of missing and available data : 
        X1 X2 X3 X4
Data    90 80 90 90
Missing 10 20 10 10
 -------------------------------------------------------------------------- 
 Imputed data matrix : 
             X1        X2        X3        X4
 [1,]  3.828826 0.8822797 0.3191040 1.2047541
 [2,]  3.671844 1.1212306 0.7395928 0.8032523
 [3,] 10.182096 1.3092018 0.9862697 2.4261920
 [4,]  7.542160 1.4730215 0.7916568 1.1890841
 [5,]  4.178310 0.9091439 0.7581093 0.4425957
 [6,]  5.077258 1.1280285 0.8627057 1.1910965
 [7,]  7.180951 0.7985699 0.7442271 1.3398152
 [8,]  4.777585 1.2025231 0.8475026 1.3549778
 [9,]  6.366217 0.7956087 0.8703417 1.0613085
[10,]  8.915754 1.8073971 0.8451348 1.1661280
[11,]  8.485119 2.3337285 0.9200857 1.0925793
[12,]  6.926006 1.1336687 0.8795206 0.7957968
[13,]  7.894479 1.7332545 0.7935383 1.5408593
[14,]  4.110186 0.3929393 0.5983372 0.9797315
[15,]  6.785904 2.6885123 0.9388732 1.5504342
[16,]  6.135088 1.1034097 0.6979510 0.8711947
[17,]  5.339481 0.6553712 0.7984800 1.4605662
[18,]  4.989604 1.0502501 0.7283188 1.0803008
[19,]  4.463594 0.7334527 0.7314590 0.8510987
[20,]  5.167510 0.4067687 0.7474457 0.4728116
 -------------------------------------------------------------------------- 
> 
> # apply PerfMeasure to the imputed data set
> 
> pm <- PerfMeasure(db.complete=x.samp, db.missing=x.samp.miss,
+                   db.imputed=imp@"Imputed.data.matrix", n.marg=4)
> 
> pm
 Main output of the function PerfMeasure 
 -------------------------------------------------------------------------- 
 Mean absolute relative error (MARE): 
[1] 0.4824859
 -------------------------------------------------------------------------- 
 Relative bias (RB) and Relative root mean squared error (RRMSE): 
         RB       RRMSE 
-0.16533883  0.02733693 
 -------------------------------------------------------------------------- 
 Upper and lower tail indexes: 
[1] "TID not computable"
 -------------------------------------------------------------------------- 
> 
> str(pm)
Formal class 'PerfMeasure' [package "CoImp"] with 4 slots
  ..@ MARE : num 0.482
  ..@ RB   : num -0.165
  ..@ RRMSE: num 0.0273
  ..@ TID  : chr "TID not computable"
> 
> 
> 
> 
> 
> 
> dev.off()
png 
  2 
>