R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Robust Hotelling test using the MM-estimator

FRBhotellingMM

R Documentation

Robust Hotelling test using the MM-estimator

Description

Robust one-sample and two-sample Hotelling test using the MM-estimator and the Fast and Robust Bootstrap.

Usage

## S3 method for class 'formula'
FRBhotellingMM(formula, data=NULL, ...)

## Default S3 method:
FRBhotellingMM(X, Y=NULL, mu0 = 0, R = 999, conf = 0.95, 
                method = c("HeFung", "pool"),control=MMcontrol(...),
                na.action=na.omit, ...)

Arguments

`formula`	an object of class `formula`; a symbolic description of the model to be fit.
`data`	data frame from which variables specified in formula are to be taken.
`X`	a matrix or data-frame
`Y`	an optional matrix or data-frame in case of a two-sample test
`mu0`	an optional vector of data values (or a single number which will be repeated p times) indicating the true value of the mean (does not apply in case of the two-sample test). Default is the null vector `mu0=0`.
`R`	number of bootstrap samples. Default is `R=999`.
`conf`	confidence level for the simultaneous confidence intervals. Default is `conf=0.95`.
`method`	for the two-sample Hotelling test, indicates the way the common covariance matrix is estimated: `"pool"`= pooled covariance matrix, `"HeFung"`= using the multisample method of He and Fung.
`control`	a list with control parameters for tuning the MM-estimate and its computing algorithm, see `MMcontrol`().
`na.action`	a function which indicates what should happen when the data contain NAs. Defaults to `na.omit`.
`...`	allows for specifying control parameters directly instead of via `control`

Details

The classical Hotelling test for testing if the mean equals a certain value or if two means are equal is modified into a robust one through substitution of the empirical estimates by the MM-estimates of location and scatter. The MM-estimator, using Tukey's biweight function, is tuned by default to have a breakdown point of 50% and 95% location efficiency. This could be changed through the control argument if desired. The MM-estimates are obtained by first computing the S-estimates with the fast-S algorithm followed by the M-part through reweighted least squares (RWLS) iteration. See MMcontrol for some adjustable tuning parameters regarding the algorithm.

The fast and robust bootstrap is used to mimic the distribution of the test statistic under the null hypothesis. For instance, the 5% critical value for the test is given by the 95% quantile of the recalculated statistics.

Robust simultaneous confidence intervals for linear combinations of the mean (or difference in means) are developed similarly to the classical case (Johnson and Wichern, 1988, page 239). The value CI is a matrix with the confidence intervals for each element of the mean (or difference in means), with level conf. It consists of two rows, the first being the lower bound and the second the upper bound. Note that these intervals are rather conservative in the sense that the simultaneous confidence level holds for all linear combinations and here only p of these are considered (with p the dimension of the data).

For the two-sample Hotelling test we assume that the samples have an underlying distribution with the same covariance matrix. This covariance matrix can be estimated in two different ways using the pooled covariance matrix or the two-sample estimator of He and Fung (He and Fung 2000), and argument method defaults to the second option. For more details see Roelant et al. (2008).

In the two-sample version, the null hypothesis always states that the two means are equal. For the one-sample version, the default null hypothesis is that the mean equals zero, but the hypothesized value can be changed and specified through argument mu0.

Bootstrap samples are discarded if the fast and robust covariance estimate is not positive definite, such that the actual number of recalculations used can be lower than R. This number is returned as ROK.

Value

An object of class FRBhot which extends class htest and contains at least the following components:

`statistic`	the value of the robust test statistic.
`pvalue`	p-value of the robust one or two-sample Hotelling test, determined by the fast and robust bootstrap
`estimate`	the estimated mean vector or vectors depending on whether it was a one-sample test or a two-sample test.
`alternative`	a character string describing the alternative hypothesis.
`method`	a character string indicating what type of Hotelling test was performed.
`data.name`	a character string giving the name(s) of the data.
`teststat.boot`	the bootstrap recalculated values of the robust test statistic.
`CI`	bootstrap simultaneous confidence intervals for each component of the center
`conf`	a copy of the `conf` argument
`Sigma`	covariance of one-sample or common covariance matrix in the case of two samples
`w`	implicit weights corresponding to the MM-estimates (i.e. final weights in the RWLS procedure)
`outFlag`	outlier flags: 1 if the robust distance of the observation exceeds the .975 quantile of (the square root of) the chi-square distribution with degrees of freedom equal to the dimension of `X`; 0 otherwise
`ROK`	number of bootstrap samples actually used (i.e. not discarded due to non-positive definite covariance

Author(s)

Ella Roelant, Stefan Van Aelst and Gert Willems

References

X. He and W.K. Fung (2000) High breakdown estimation for multiple populations with applications to discriminant analysis. Journal of Multivariate Analysis, 72, 151–162.
R.A. Johnson, D.W. Wichern (1988) Applied Multivariate Statistical Analysis, 2nd Edition, Prentice-Hall.
E. Roelant, S. Van Aelst and G. Willems, (2008) Fast Bootstrap for Robust Hotelling Tests, COMPSTAT 2008: Proceedings in Computational Statistics (P. Brito, Ed.) Heidelberg: Physika-Verlag, 709–719.
M. Salibian-Barrera, S. Van Aelst and G. Willems (2008) Fast and robust bootstrap. Statistical Methods and Applications, 17, 41–71.
S. Van Aelst and G. Willems (2013). Fast and Robust Bootstrap for Multivariate Inference: The R Package FRB. Journal of Statistical Software, 53(3), 1–32. URL: http://www.jstatsoft.org/v53/i03/.

Examples


## One sample robust Hotelling test
data(delivery)
delivery.x <- delivery[,1:2]
FRBhotellingMM(delivery.x,R=199)

## One sample robust Hotelling test
data(ForgedBankNotes)
samplemean <- apply(ForgedBankNotes, 2, mean)
res = FRBhotellingMM(ForgedBankNotes, mu0=samplemean,R=199)
res
# Note that the test rejects the hypothesis that the true mean equals the
# sample mean; this is due to outliers in the data (i.e. the robustly estimated
# mean apparently significantly differs from the non-robust sample mean.

# Graphical display of the results:
plot(res)
# It is clear from the (scaled) simultaneous confidence limits that the rejection
# of the hypothesis is due to the differences in variables Bottom and Diagonal







## Two sample robust Hotelling test
data(hemophilia)
grp <-as.factor(hemophilia[,3])
x <- hemophilia[which(grp==levels(grp)[1]),1:2]
y <- hemophilia[which(grp==levels(grp)[2]),1:2]

#using the pooled covariance matrix to estimate the common covariance matrix
## Not run: res <- FRBhotellingMM(x,y,method="pool")

#using the estimator of He and Fung to estimate the common covariance matrix
res <- FRBhotellingMM(x,y,method="HeFung",R=199)

# or using the formula interface
## Not run: res <- FRBhotellingMM(as.matrix(hemophilia[,-3])~hemophilia[,3],method="HeFung")

# From the confidence limits it can be seen that the significant difference
# is mainly caused by the AHFactivity variable. The graphical display helps too:
plot(res)
# the red line on the histogram indicates the test statistic value in the original
# sample (it is omitted if the statistic exceeds 100)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(FRB)
Loading required package: corpcor
Loading required package: rrcov
Loading required package: robustbase
Scalable Robust Estimators with High Breakdown Point (version 1.3-11)

> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/FRB/FRBhotellingMM.Rd_%03d_medium.png", width=480, height=480)
> ### Name: FRBhotellingMM
> ### Title: Robust Hotelling test using the MM-estimator
> ### Aliases: FRBhotellingMM FRBhotellingMM.default FRBhotellingMM.formula
> ### Keywords: htest multivariate robust
> 
> ### ** Examples
> 
> 
> ## One sample robust Hotelling test
> data(delivery)
> delivery.x <- delivery[,1:2]
> FRBhotellingMM(delivery.x,R=199)

	One sample Hotelling test based on multivariate MM-estimates (bdp =
	0.5, eff = 0.95)

data:  delivery.x
T^2_R = 84.588, p-value = 0.005814
alternative hypothesis: true mean vector is not equal to (0,0) 

sample estimates:
              n.prod distance
MM-loc vector   6.69  326.629

> 
> ## One sample robust Hotelling test
> data(ForgedBankNotes)
> samplemean <- apply(ForgedBankNotes, 2, mean)
> res = FRBhotellingMM(ForgedBankNotes, mu0=samplemean,R=199)
> res

	One sample Hotelling test based on multivariate MM-estimates (bdp =
	0.5, eff = 0.95)

data:  ForgedBankNotes
T^2_R = 128.68, p-value < 2.2e-16
alternative hypothesis: true mean vector is not equal to (214.823,130.3,130.193,10.53,11.133,139.45) 

sample estimates:
               Length    Left   Right Bottom    Top Diagonal
MM-loc vector 214.781 130.267 130.181  10.86 11.101  139.628

> # Note that the test rejects the hypothesis that the true mean equals the
> # sample mean; this is due to outliers in the data (i.e. the robustly estimated
> # mean apparently significantly differs from the non-robust sample mean.
> 
> # Graphical display of the results:
> plot(res)
> # It is clear from the (scaled) simultaneous confidence limits that the rejection
> # of the hypothesis is due to the differences in variables Bottom and Diagonal
> 
> 
> ## Two sample robust Hotelling test
> data(hemophilia)
> grp <-as.factor(hemophilia[,3])
> x <- hemophilia[which(grp==levels(grp)[1]),1:2]
> y <- hemophilia[which(grp==levels(grp)[2]),1:2]
> 
> #using the pooled covariance matrix to estimate the common covariance matrix
> ## Not run: res <- FRBhotellingMM(x,y,method="pool")
> 
> #using the estimator of He and Fung to estimate the common covariance matrix
> res <- FRBhotellingMM(x,y,method="HeFung",R=199)
> 
> # or using the formula interface
> ## Not run: res <- FRBhotellingMM(as.matrix(hemophilia[,-3])~hemophilia[,3],method="HeFung")
> 
> # From the confidence limits it can be seen that the significant difference
> # is mainly caused by the AHFactivity variable. The graphical display helps too:
> plot(res)
> # the red line on the histogram indicates the test statistic value in the original
> # sample (it is omitted if the statistic exceeds 100)
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>