Last data update: 2014.03.03

R: Influential Error Detection
sel.editR Documentation

Influential Error Detection

Description

Computes the score function and identifies influential errors

Usage

       sel.edit (y, ypred, wgt=rep(1,nrow(as.matrix(y ))), 
                 tot=colSums(ypred * wgt), t.sel=0.01) 

Arguments

y

matrix or data frame containing the response variables

ypred

matrix of predicted values for y variables

wgt

optional vector of sampling weights (default=1)

tot

optional vector containing reference estimates of totals for the y variables. If omitted, it is computed as the (possibly weighted) sum of predicted values

t.sel

optional vector of threshold values, one for each variable, for selective editing (default=0.01)

Details

This function ranks observations (rank) according to the importance of their potential errors. The order is made with respect to the global score function values (global.score). The function also selects the units to be edited (sel) so that the expected residual error of all variables is below a prefixed level of accuracy (t.sel). The global score (global.score) is the maximum of the local scores computed for each variable (y1.score, y2.score,...). The local scores are defined as a weighted (weights) absolute difference between the observed (y1, y2,...) and the predicted values (y1.p, y2.p,...) standardised with respect to the reference total estimates (tot).

The selection of the units to be edited because affected by an influential error (sel=1) is made according to a two-step algorithm:
1) order the observations with respect to the global.score (decreasing order);
2) select the first k units such that, from the (k+1)th to the last observation, all the residual errors (y1.reserr, y2.reserr,...) for each variable are below t.sel.

The function provides also an indicator function (y1.sel, y2.sel,...) reporting which variables contain an influential errors in a unit selected for the revision.

Value

sel.edit returns a data matrix containing the following columns:

y1, y2,...

observed variables

y1.p, y2.p,...

predictions of y variables

weights

sampling weights

y1.score, y2.score,...

local scores

global.score

global score

y1.reserr, y2.reserr,...

residual errors

y1.sel, y2.sel,...

influential error flags

rank

rank according to global score

sel

1 if the observation contains an influential error, 0 otherwise

Author(s)

M. Teresa Buglielli <bugliell@istat.it>, Ugo Guarnera <guarnera@istat.it>

References

Di Zio, M., Guarnera, U. (2013) "A Contamination Model for Selective Editing", Journal of Official Statistics. Volume 29, Issue 4, Pages 539-555 (http://dx.doi.org/10.2478/jos-2013-0039).

Buglielli, M.T., Di Zio, M., Guarnera, U. (2010) "Use of Contamination Models for Selective Editing", European Conference on Quality in Survey Statistics Q2010, Helsinki, 4-6 May 2010.

Examples

# Example 1
# Parameter estimation with one contaminated variable and one covariate
    data(ex1.data)
    ml.par <- ml.est(y=ex1.data[,"Y1"], x=ex1.data[,"X1"])
# Detection of influential errors    
    sel <- sel.edit(y=ex1.data[,"Y1"], ypred=ml.par$ypred)
    head(sel)
    sum(sel[,"sel"])
# orders results for decreasing importance of score     
    sel.ord <- sel[order(sel[,"rank"]),  ] 
# adds columns to data
    ex1.data <- cbind(ex1.data, tau=ml.par$tau, outlier=ml.par$outlier,
                      sel[,c("rank", "sel")])
# plot of data with outliers and influential errors 
    sel.pairs(ex1.data[,c("X1","Y1")],outl=ml.par$outlier, sel=sel[,"sel"])
# Example 2
    data(ex2.data)
    ml.par <- ml.est(y=ex2.data)
    sel <- sel.edit(y=ex2.data, ypred=ml.par$ypred)	
    sel.pairs(ex2.data,outl=ml.par$outlier, sel=sel[,"sel"])

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(SeleMix)
Loading required package: mvtnorm
Loading required package: Ecdat
Loading required package: Ecfun

Attaching package: 'Ecfun'

The following object is masked from 'package:base':

    sign


Attaching package: 'Ecdat'

The following object is masked from 'package:datasets':

    Orange

Loading required package: xtable
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/SeleMix/sel.edit.Rd_%03d_medium.png", width=480, height=480)
> ### Name: sel.edit
> ### Title: Influential Error Detection
> ### Aliases: sel.edit
> 
> ### ** Examples
> 
> # Example 1
> # Parameter estimation with one contaminated variable and one covariate
>     data(ex1.data)
>     ml.par <- ml.est(y=ex1.data[,"Y1"], x=ex1.data[,"X1"])
> # Detection of influential errors    
>     sel <- sel.edit(y=ex1.data[,"Y1"], ypred=ml.par$ypred)
>     head(sel)
            y1      y1.p weights     y1.score global.score     y1.reserr y1.sel
[1,]  1.422594  1.447798       1 3.675953e-06 3.675953e-06 -1.741156e-04      0
[2,] 46.434483 45.617588       1 1.191404e-04 1.191404e-04 -2.706110e-03      0
[3,] 15.464228 15.612103       1 2.156699e-05 2.156699e-05 -1.111358e-03      0
[4,] 42.523488 41.697518       1 1.204639e-04 1.204639e-04 -2.585647e-03      0
[5,]  1.054655  1.042779       1 1.732091e-06 1.732091e-06 -3.767864e-05      0
[6,] 10.201514 10.258264       1 8.276714e-06 8.276714e-06 -5.380873e-04      0
     rank sel
[1,]  330   0
[2,]   58   0
[3,]  146   0
[4,]   57   0
[5,]  398   0
[6,]  236   0
>     sum(sel[,"sel"])
[1] 6
> # orders results for decreasing importance of score     
>     sel.ord <- sel[order(sel[,"rank"]),  ] 
> # adds columns to data
>     ex1.data <- cbind(ex1.data, tau=ml.par$tau, outlier=ml.par$outlier,
+                       sel[,c("rank", "sel")])
> # plot of data with outliers and influential errors 
>     sel.pairs(ex1.data[,c("X1","Y1")],outl=ml.par$outlier, sel=sel[,"sel"])
> # Example 2
>     data(ex2.data)
>     ml.par <- ml.est(y=ex2.data)
>     sel <- sel.edit(y=ex2.data, ypred=ml.par$ypred)	
>     sel.pairs(ex2.data,outl=ml.par$outlier, sel=sel[,"sel"])
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>