Last data update: 2014.03.03

R: Plot Method for Objects of class 'FRBmultireg'
diagplotR Documentation

Plot Method for Objects of class 'FRBmultireg'

Description

Diagnostic plots for objects of class FRBmultireg, FRBpca and FRBhot. It shows robust distances and allows detection of multivariate outliers.

Usage

## S3 method for class 'FRBmultireg'
diagplot(x, Xdist = TRUE, ...)

## S3 method for class 'FRBpca'
diagplot(x, EIF = TRUE, ...)

## S3 method for class 'FRBhot'
diagplot(x, ...)

Arguments

x

an R object of class FRBmultireg (typically created by FRBmultiregS, FRBmultiregMM or FRBmultiregGS or by Sest_multireg, MMest_multireg or GSest_multireg) or an R object of class FRBpca (typically created by FRBpcaS or FRBpcaMM) or an R object of class FRBhot (typically created by FRBhotellingS or FRBhotellingMM)

Xdist

logical: if TRUE, the plot shows the robust distance versus the distance in the space of the explanatory variables; if FALSE, it plots the robust distance versus the index of the observation

EIF

logical: if TRUE, the plot shows the robust distance versus an influence measure for each point; if FALSE, it plots the robust distance versus the index of the observation

...

potentially more arguments to be passed

Details

The diagnostic plots are based on the robust distances of the observations. In a multivariate sample X_n={x_1,...,x_n}, the robust distance d_i of observation i is given by d_i^2=(x_i-μ)'Σ^(-1)(x_i-μ). where μ and Σ are robust estimates of location and covariance. Observations with large robust distance are considered as outlying.

The default diagnostic plot in the multivariate regresssion setting (i.e. for objects of type FRBmultireg and Xdist=TRUE), shows the residual distances (i.e. the robust distances of the multivariate residuals) based on the estimates in x, versus the distances within the space of the explanatory variables. The latter are based on robust estimates of location and scatter for the data matrix x$X (without intercept). Computing these robust estimates may take an appreciable amount of time. The estimator used corresponds to the one which was used in obtaining Xmultireg (with the same breakdown point, for example, and the same control parameters). On the vertical axis a cutoff line is drawn at the square root of the .975 quantile of the chi-squared distribution with degrees of freedom equal to the number of response variables. On the horizontal axis the same quantile is drawn but now with degrees of freedom equal to the number of covariates (not including intercept). Those points to the right of the cutoff can be viewed as high-leverage points. These can be classified into so-called 'bad' or 'good' leverage points depending on whether they are above or below the cutoff. Points above the cutoff but to the left of the vertical cutoff are sometimes called vertical outliers. See also Van Aelst and Willems (2005) for example.

To avoid the additional computation time, one can choose Xdist=FALSE, in which case the residual distances are simply plotted versus the index of the observation.

The default plot in the context of PCA (i.e. for objects of type FRBpca and EIF=FALSE) is a plot proposed by Pison and Van Aelst (2004). It shows the robust distance versus a measure of the overall empirical influence of the observation on the (classical) principal components. The empirical influences are obtained by using the influence function of the eigenvectors of the empirical or classical shape estimator at the normal model, and by substituting therein the robust estimates for the population parameters. The overall influence value is then defined by averaging the squared influence over all coefficients in the eigenvectors. The vertical line on the plot is an indicative cutoff value, obtained through simulation. This last part takes a few moments of computation time.

Again, to avoid the additional computation time, one can choose EIF=FALSE, in which case the robust distances are simply plotted versus the index of the observation.

For the result of the robust Hotelling test (i.e. for objects of type FRBhot), the method plots the robust distance versus the index. In case of a two-sample test, the indices are within-sample and a vertical line separates the two groups. In the two-sample case, each group has its own location estimate μ and a common covariance estimate Σ.

Author(s)

Gert Willems and Ella Roelant

References

  • G. Pison and S. Van Aelst (2004). Diagnostic Plots for Robust Multivariate Methods. Journal of Computational and Graphical Statistics, 13, 310–329.

  • S. Van Aelst and G. Willems (2005). Multivariate Regression S-Estimators for Robust Estimation and Inference. Statistica Sinica, 15, 981–1001.

  • S. Van Aelst and G. Willems (2013). Fast and Robust Bootstrap for Multivariate Inference: The R Package FRB. Journal of Statistical Software, 53(3), 1–32. URL: http://www.jstatsoft.org/v53/i03/.

See Also

FRBmultiregS, FRBmultiregMM, FRBmultiregGS, FRBpcaS , FRBpcaMM, FRBhotellingS, FRBhotellingMM

Examples


# for multivariate regression:
data(schooldata)
MMres <- MMest_multireg(cbind(reading,mathematics,selfesteem)~., data=schooldata)
diagplot(MMres)
# a large 'bad leverage' outlier should be noticeable (observation 59)

# for PCA:
## Not run: 
data(ForgedBankNotes)
MMres <- FRBpcaMM(ForgedBankNotes)
diagplot(MMres)
## End(Not run)

# a group of 15 fairly strong outliers can be seen which apparently would have
# a large general influence on a classical PCA analysis

# for Hotelling tests (two-sample)
## Not run: 
data(hemophilia)
MMres <- FRBhotellingMM(cbind(AHFactivity,AHFantigen)~gr,data=hemophilia)
diagplot(MMres)
## End(Not run)

# the data seem practically outlier-free


Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(FRB)
Loading required package: corpcor
Loading required package: rrcov
Loading required package: robustbase
Scalable Robust Estimators with High Breakdown Point (version 1.3-11)

> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/FRB/diagplot.rd_%03d_medium.png", width=480, height=480)
> ### Name: diagplot
> ### Title: Plot Method for Objects of class 'FRBmultireg'
> ### Aliases: diagplot diagplot.FRBmultireg diagplot.FRBpca diagplot.FRBhot
> 
> ### ** Examples
> 
> 
> # for multivariate regression:
> data(schooldata)
> MMres <- MMest_multireg(cbind(reading,mathematics,selfesteem)~., data=schooldata)
> diagplot(MMres)
> # a large 'bad leverage' outlier should be noticeable (observation 59)
> 
> # for PCA:
> ## Not run: 
> ##D data(ForgedBankNotes)
> ##D MMres <- FRBpcaMM(ForgedBankNotes)
> ##D diagplot(MMres)
> ## End(Not run)
> 
> # a group of 15 fairly strong outliers can be seen which apparently would have
> # a large general influence on a classical PCA analysis
> 
> # for Hotelling tests (two-sample)
> ## Not run: 
> ##D data(hemophilia)
> ##D MMres <- FRBhotellingMM(cbind(AHFactivity,AHFantigen)~gr,data=hemophilia)
> ##D diagplot(MMres)
> ## End(Not run)
> 
> # the data seem practically outlier-free
> 
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>