R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Influential Error Detection

sel.edit

R Documentation

Influential Error Detection

Description

Computes the score function and identifies influential errors

Usage

       sel.edit (y, ypred, wgt=rep(1,nrow(as.matrix(y ))), 
                 tot=colSums(ypred * wgt), t.sel=0.01)

Arguments

`y`	matrix or data frame containing the response variables
`ypred`	matrix of predicted values for y variables
`wgt`	optional vector of sampling weights (default=1)
`tot`	optional vector containing reference estimates of totals for the y variables. If omitted, it is computed as the (possibly weighted) sum of predicted values
`t.sel`	optional vector of threshold values, one for each variable, for selective editing (default=0.01)

Details

This function ranks observations (rank) according to the importance of their potential errors. The order is made with respect to the global score function values (global.score). The function also selects the units to be edited (sel) so that the expected residual error of all variables is below a prefixed level of accuracy (t.sel). The global score (global.score) is the maximum of the local scores computed for each variable (y1.score, y2.score,...). The local scores are defined as a weighted (weights) absolute difference between the observed (y1, y2,...) and the predicted values (y1.p, y2.p,...) standardised with respect to the reference total estimates (tot).

The selection of the units to be edited because affected by an influential error (sel=1) is made according to a two-step algorithm:
1) order the observations with respect to the global.score (decreasing order);
2) select the first k units such that, from the (k+1)th to the last observation, all the residual errors (y1.reserr, y2.reserr,...) for each variable are below t.sel.

The function provides also an indicator function (y1.sel, y2.sel,...) reporting which variables contain an influential errors in a unit selected for the revision.

Value

sel.edit returns a data matrix containing the following columns:

`y1, y2,...`	observed variables
`y1.p, y2.p,...`	predictions of y variables
`weights`	sampling weights
`y1.score, y2.score,...`	local scores
`global.score`	global score
`y1.reserr, y2.reserr,...`	residual errors
`y1.sel, y2.sel,...`	influential error flags
`rank`	rank according to global score
`sel`	1 if the observation contains an influential error, 0 otherwise

Author(s)

M. Teresa Buglielli <bugliell@istat.it>, Ugo Guarnera <guarnera@istat.it>

References

Di Zio, M., Guarnera, U. (2013) "A Contamination Model for Selective Editing", Journal of Official Statistics. Volume 29, Issue 4, Pages 539-555 (http://dx.doi.org/10.2478/jos-2013-0039).

Buglielli, M.T., Di Zio, M., Guarnera, U. (2010) "Use of Contamination Models for Selective Editing", European Conference on Quality in Survey Statistics Q2010, Helsinki, 4-6 May 2010.

Examples

# Example 1
# Parameter estimation with one contaminated variable and one covariate
    data(ex1.data)
    ml.par <- ml.est(y=ex1.data[,"Y1"], x=ex1.data[,"X1"])
# Detection of influential errors    
    sel <- sel.edit(y=ex1.data[,"Y1"], ypred=ml.par$ypred)
    head(sel)
    sum(sel[,"sel"])
# orders results for decreasing importance of score     
    sel.ord <- sel[order(sel[,"rank"]),  ] 
# adds columns to data
    ex1.data <- cbind(ex1.data, tau=ml.par$tau, outlier=ml.par$outlier,
                      sel[,c("rank", "sel")])
# plot of data with outliers and influential errors 
    sel.pairs(ex1.data[,c("X1","Y1")],outl=ml.par$outlier, sel=sel[,"sel"])
# Example 2
    data(ex2.data)
    ml.par <- ml.est(y=ex2.data)
    sel <- sel.edit(y=ex2.data, ypred=ml.par$ypred)	
    sel.pairs(ex2.data,outl=ml.par$outlier, sel=sel[,"sel"])

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(SeleMix)
Loading required package: mvtnorm
Loading required package: Ecdat
Loading required package: Ecfun

Attaching package: 'Ecfun'

The following object is masked from 'package:base':

    sign


Attaching package: 'Ecdat'

The following object is masked from 'package:datasets':

    Orange

Loading required package: xtable
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/SeleMix/sel.edit.Rd_%03d_medium.png", width=480, height=480)
> ### Name: sel.edit
> ### Title: Influential Error Detection
> ### Aliases: sel.edit
> 
> ### ** Examples
> 
> # Example 1
> # Parameter estimation with one contaminated variable and one covariate
>     data(ex1.data)
>     ml.par <- ml.est(y=ex1.data[,"Y1"], x=ex1.data[,"X1"])
> # Detection of influential errors    
>     sel <- sel.edit(y=ex1.data[,"Y1"], ypred=ml.par$ypred)
>     head(sel)
            y1      y1.p weights     y1.score global.score     y1.reserr y1.sel
[1,]  1.422594  1.447798       1 3.675953e-06 3.675953e-06 -1.741156e-04      0
[2,] 46.434483 45.617588       1 1.191404e-04 1.191404e-04 -2.706110e-03      0
[3,] 15.464228 15.612103       1 2.156699e-05 2.156699e-05 -1.111358e-03      0
[4,] 42.523488 41.697518       1 1.204639e-04 1.204639e-04 -2.585647e-03      0
[5,]  1.054655  1.042779       1 1.732091e-06 1.732091e-06 -3.767864e-05      0
[6,] 10.201514 10.258264       1 8.276714e-06 8.276714e-06 -5.380873e-04      0
     rank sel
[1,]  330   0
[2,]   58   0
[3,]  146   0
[4,]   57   0
[5,]  398   0
[6,]  236   0
>     sum(sel[,"sel"])
[1] 6
> # orders results for decreasing importance of score     
>     sel.ord <- sel[order(sel[,"rank"]),  ] 
> # adds columns to data
>     ex1.data <- cbind(ex1.data, tau=ml.par$tau, outlier=ml.par$outlier,
+                       sel[,c("rank", "sel")])
> # plot of data with outliers and influential errors 
>     sel.pairs(ex1.data[,c("X1","Y1")],outl=ml.par$outlier, sel=sel[,"sel"])
> # Example 2
>     data(ex2.data)
>     ml.par <- ml.est(y=ex2.data)
>     sel <- sel.edit(y=ex2.data, ypred=ml.par$ypred)	
>     sel.pairs(ex2.data,outl=ml.par$outlier, sel=sel[,"sel"])
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>