Last data update: 2014.03.03

R: S-Estimates for Multivariate Regression with Bootstrap...
FRBmultiregSR Documentation

S-Estimates for Multivariate Regression with Bootstrap Inference

Description

Computes S-estimates for multivariate regression together with standard errors, confidence intervals and p-values based on the Fast and Robust Bootstrap.

Usage

## S3 method for class 'formula'
FRBmultiregS(formula, data=NULL, ...)

## Default S3 method:
FRBmultiregS(X, Y, int = TRUE, R = 999, bdp = 0.5, conf = 0.95, 
                control=Scontrol(...), na.action=na.omit, ...)

Arguments

formula

an object of class formula; a symbolic description of the model to be fit.

data

data frame from which variables specified in formula are to be taken.

X

a matrix or data frame containing the explanatory variables.

Y

a matrix or data frame containing the response variables.

int

logical: if TRUE an intercept term is added to the model (unless it is already present in X)

R

number of bootstrap samples. Default is R=999.

bdp

required breakdown point for the S-estimates. Should have 0 < bdp ≤ 0.5, the default is 0.5

conf

level of the bootstrap confidence intervals. Default is conf=0.95

control

a list with control parameters for tuning the computing algorithm, see Scontrol().

na.action

a function which indicates what should happen when the data contain NAs. Defaults to na.omit.

...

allows for specifying control parameters directly instead of via control

Details

Multivariate S-estimates were introduced by Davies (1987) and can be highly robust while enjoying a reasonable Gaussian efficiency. Their use in the multivariate regression setting was discussed in Van Aelst and Willems (2005). The loss function used here is Tukey's biweight. It is tuned in order to achieve the required breakdown point bdp (any value between 0 and 0.5).

The computation is carried out by a call to Sest_multireg(), which performs the fast-S algorithm (Salibian-Barrera and Yohai 2006), see Scontrol for its tuning parameters. The result of this call is also returned as the value est.

The Fast and Robust Bootstrap (Salibian-Barrera and Zamar 2002) is used to calculate so-called basic bootstrap confidence intervals and bias corrected and accelerated (BCa) confidence intervals (Davison and Hinkley 1997, p.194 and p.204 respectively). Apart from the intervals with the requested confidence level, the function also returns p-values for each coefficient corresponding to the hypothesis that the actual coefficient is zero. The p-values are computed as 1 minus the smallest level for which the confidence intervals would include zero. Both BCa and basic bootstrap p-values in this sense are given. The bootstrap calculation is carried out by a call to Sboot_multireg(), the result of which is returned as the value bootest. Bootstrap standard errors are returned as well.

Note: Bootstrap samples which contain too few distinct observations with positive weights are discarded (a warning is given if this happens). The number of samples actually used is returned via ROK.

In the formula-interface, a multivariate response is produced via cbind. For example cbind(x4,x5) ~ x1+x2+x3. All arguments from the default method can also be passed to the formula method except for int (passing int explicitely will produce an error; the inclusion of an intercept term is determined by formula).

The returned object inherits from class mlm such that the standard coef, residuals, fitted and predict functions can be used.

Value

An object of class FRBmultireg which extends class mlm and contains at least the following components:

coefficients

MM-estimates of the regression coefficients

residuals

the residuals, that is response minus fitted values

fitted.values

the fitted values.

Sigma

S-estimate of the error covariance matrix

scale

MM-estimate of the size of the multivariate errors

weights

implicit weights corresponding to the S-estimates (i.e. final weights in the RWLS procedure at the end of the fast-S algorithm)

outFlag

outlier flags: 1 if the robust distance of the residual exceeds the .975 quantile of (the square root of) the chi-square distribution with degrees of freedom equal to the dimension of the responses; 0 otherwise

SE

bootstrap standard errors corresponding to the regression coefficients.

cov

bootstrap covariance matrix corresponding to the regression coefficients (in vectorized form)

CI.bca.lower

a matrix containing the lower bounds of the bias corrected and accelerated confidence intervals for the regression coefficients.

CI.bca.upper

a matrix containing the upper bounds of the bias corrected and accelerated confidence intervals for the regression coefficients.

CI.basic.lower

a matrix containing the lower bounds of basic bootstrap intervals for the regression coefficients.

CI.basic.upper

a matrix containing the upper bounds of basic bootstrap intervals for the regression coefficients.

p.bca

a matrix containing the p-values based on the BCa confidence intervals for the regression coefficients.

p.basic

a matrix containing the p-values based on the basic bootstrap intervals for the regression coefficients.

est

S-estimates as returned by the call to Sest_multireg()

bootest

bootstrap results for the S-estimates as returned by the call to Sboot_multireg()

conf

a copy of the conf argument

method

a list with following components: est = character string indicating that S-estimates were used, and bdp = a copy of the bdp argument

control

a copy of the control argument

X, Y

either copies of the respective arguments or the corresponding matrices produced from formula

ROK

number of bootstrap samples actually used (i.e. not discarded due to too few distinct observations with positive weight)

Author(s)

Gert Willems, Stefan Van Aelst and Ella Roelant

References

  • P.L. Davies (1987) Asymptotic behavior of S-estimates of multivariate location parameters and dispersion matrices. The Annals of Statistics, 15, 1269-1292.

  • A.C. Davison and D.V. Hinkley (1997) Bootstrap Methods and their Application. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge: Cambridge University Press.

  • M. Salibian-Barrera, S. Van Aelst and G. Willems (2008) Fast and robust bootstrap. Statistical Methods and Applications, 17, 41-71.

  • M. Salibian-Barrera and V. Yohai (2006) A fast algorithm for S-regression estimates. Journal of Computational and Graphical Statistics, 15, 414-427.

  • M. Salibian-Barrera, R.H. Zamar (2002) Bootstrapping robust estimates of regression. The Annals of Statistics, 30, 556-582.

  • S. Van Aelst and G. Willems (2005) Multivariate regression S-estimators for robust estimation and inference. Statistica Sinica, 15, 981-1001.

  • S. Van Aelst and G. Willems (2013). Fast and robust bootstrap for multivariate inference: The R package FRB. Journal of Statistical Software, 53(3), 1–32. URL: http://www.jstatsoft.org/v53/i03/.

See Also

summary.FRBmultireg, plot.FRBmultireg, Sboot_multireg, Sest_multireg, FRBmultiregMM, FRBmultiregGS, Scontrol

Examples

data(schooldata)
school.x <- data.matrix(schooldata[,1:5])
school.y <- data.matrix(schooldata[,6:8])

#computes 25% breakdown point S-estimate and 99% confidence intervals 
#based on 999 bootstrap samples:
Sres <- FRBmultiregS(school.x, school.y, R=999, bdp = 0.25, conf = 0.99)
#or, equivalently using the formula interface
## Not run: Sres <- FRBmultiregS(cbind(reading,mathematics,selfesteem)~., data=schooldata, 
        R=999, bdp = 0.25, conf = 0.99)
## End(Not run)
          
#the print method displays the coefficient estimates 
Sres

#the summary function additionally displays the bootstrap standard errors and p-values
#("BCA" method by default)
summary(Sres)

summary(Sres, confmethod="basic")
                                                              
#ask explicitely for the coefficient matrix:
Sres$coefficients
# or equivalently,
coef(Sres)
#For the error covariance matrix:
Sres$Sigma
                                                              
#plot some bootstrap histograms for the coefficient estimates 
#(with "BCA" intervals by default) 
plot(Sres, expl=c("education", "occupation"), resp=c("selfesteem","reading"))

#plot bootstrap histograms for all coefficient estimates
plot(Sres)
#probably the plot-function has made a selection of coefficients to plot here, 
#since 'all' was too many to  fit on one page, see help(plot.FRBmultireg); 
#this is platform-dependent







Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(FRB)
Loading required package: corpcor
Loading required package: rrcov
Loading required package: robustbase
Scalable Robust Estimators with High Breakdown Point (version 1.3-11)

> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/FRB/FRBmultiregS.Rd_%03d_medium.png", width=480, height=480)
> ### Name: FRBmultiregS
> ### Title: S-Estimates for Multivariate Regression with Bootstrap Inference
> ### Aliases: FRBmultiregS FRBmultiregS.default FRBmultiregS.formula
> ### Keywords: multivariate robust
> 
> ### ** Examples
> 
> data(schooldata)
> school.x <- data.matrix(schooldata[,1:5])
> school.y <- data.matrix(schooldata[,6:8])
> 
> #computes 25% breakdown point S-estimate and 99% confidence intervals 
> #based on 999 bootstrap samples:
> Sres <- FRBmultiregS(school.x, school.y, R=999, bdp = 0.25, conf = 0.99)
> #or, equivalently using the formula interface
> ## Not run: 
> ##D Sres <- FRBmultiregS(cbind(reading,mathematics,selfesteem)~., data=schooldata, 
> ##D         R=999, bdp = 0.25, conf = 0.99)
> ## End(Not run)
>           
> #the print method displays the coefficient estimates 
> Sres

Multivariate regression based on multivariate S-estimates (breakdown point = 0.25) 

Coefficients:
            reading mathematics selfesteem
(intercept)  1.9630      2.5348     0.0775
education    0.1223      0.0117    -0.0471
occupation   4.9368      6.1086     2.1510
visit        0.0721      0.0201     0.2368
counseling  -0.7878     -0.8265    -0.0833
teacher     -0.1920     -0.2937     0.0219
> 
> #the summary function additionally displays the bootstrap standard errors and p-values
> #("BCA" method by default)
> summary(Sres)

Multivariate regression based on S-estimates (breakdown point = 0.25) 


Response reading:

Residuals:
    Min      1Q  Median      3Q     Max 
-14.793  -1.881   0.243   2.353  26.866 

Coefficients:
            Estimate   Std.Error   p-value    
(intercept)   1.9630      1.0316    0.0576   .
education     0.1223      0.0772    0.0961   .
occupation    4.9368      1.2945    0.0000 ***
visit         0.0721      0.3263    0.8102    
counseling   -0.7878      0.2263    0.0000 ***
teacher      -0.1920      0.1752    0.1866    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

p-values based on BCA method!


Response mathematics:

Residuals:
    Min      1Q  Median      3Q     Max 
-10.604  -2.880  -0.588   3.176  38.704 

Coefficients:
            Estimate   Std.Error   p-value    
(intercept)   2.5348      1.1746    0.0208   *
education     0.0117      0.0897    0.9791    
occupation    6.1086      1.3368    0.0000 ***
visit         0.0201      0.3111    0.9281    
counseling   -0.8265      0.3338    0.0127   *
teacher      -0.2937      0.2060    0.0884   .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

p-values based on BCA method!


Response selfesteem:

Residuals:
    Min      1Q  Median      3Q     Max 
-2.6678 -0.6665  0.0219  0.6815  3.4824 

Coefficients:
            Estimate   Std.Error   p-value    
(intercept)   0.0775      0.3970   0.95988    
education    -0.0471      0.0404   0.22011    
occupation    2.1510      0.5254   0.00000 ***
visit         0.2368      0.0822   0.00296  **
counseling   -0.0833      0.1268   0.49649    
teacher       0.0219      0.0451   0.61759    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

p-values based on BCA method!

Robust residual scale: 2.2 

Error covariance matrix estimate:
            reading mathematics selfesteem
reading       14.49        14.4       2.55
mathematics   14.44        21.5       2.60
selfesteem     2.55         2.6       1.55
> 
> summary(Sres, confmethod="basic")

Multivariate regression based on S-estimates (breakdown point = 0.25) 


Response reading:

Residuals:
    Min      1Q  Median      3Q     Max 
-14.793  -1.881   0.243   2.353  26.866 

Coefficients:
            Estimate   Std.Error   p-value    
(intercept)   1.9630      1.0316     0.112    
education     0.1223      0.0772     0.180    
occupation    4.9368      1.2945     0.000 ***
visit         0.0721      0.3263     0.853    
counseling   -0.7878      0.2263     0.000 ***
teacher      -0.1920      0.1752     0.322    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

p-values based on Basic Bootstrap!


Response mathematics:

Residuals:
    Min      1Q  Median      3Q     Max 
-10.604  -2.880  -0.588   3.176  38.704 

Coefficients:
            Estimate   Std.Error   p-value    
(intercept)   2.5348      1.1746    0.0641   .
education     0.0117      0.0897    0.9790    
occupation    6.1086      1.3368    0.0000 ***
visit         0.0201      0.3111    0.9249    
counseling   -0.8265      0.3338    0.0020  **
teacher      -0.2937      0.2060    0.2102    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

p-values based on Basic Bootstrap!


Response selfesteem:

Residuals:
    Min      1Q  Median      3Q     Max 
-2.6678 -0.6665  0.0219  0.6815  3.4824 

Coefficients:
            Estimate   Std.Error   p-value    
(intercept)   0.0775      0.3970    0.9850    
education    -0.0471      0.0404    0.0641   .
occupation    2.1510      0.5254    0.0000 ***
visit         0.2368      0.0822    0.0160   *
counseling   -0.0833      0.1268    0.4124    
teacher       0.0219      0.0451    0.5085    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

p-values based on Basic Bootstrap!

Robust residual scale: 2.2 

Error covariance matrix estimate:
            reading mathematics selfesteem
reading       14.49        14.4       2.55
mathematics   14.44        21.5       2.60
selfesteem     2.55         2.6       1.55
>                                                               
> #ask explicitely for the coefficient matrix:
> Sres$coefficients
                reading mathematics  selfesteem
(intercept)  1.96300493  2.53480818  0.07754723
education    0.12227724  0.01170100 -0.04705534
occupation   4.93683489  6.10863080  2.15101384
visit        0.07206901  0.02007182  0.23683535
counseling  -0.78777754 -0.82646865 -0.08325992
teacher     -0.19199705 -0.29369835  0.02185443
> # or equivalently,
> coef(Sres)
                reading mathematics  selfesteem
(intercept)  1.96300493  2.53480818  0.07754723
education    0.12227724  0.01170100 -0.04705534
occupation   4.93683489  6.10863080  2.15101384
visit        0.07206901  0.02007182  0.23683535
counseling  -0.78777754 -0.82646865 -0.08325992
teacher     -0.19199705 -0.29369835  0.02185443
> #For the error covariance matrix:
> Sres$Sigma
              reading mathematics selfesteem
reading     14.492357   14.438860   2.547899
mathematics 14.438860   21.470795   2.596702
selfesteem   2.547899    2.596702   1.551366
>                                                               
> #plot some bootstrap histograms for the coefficient estimates 
> #(with "BCA" intervals by default) 
> plot(Sres, expl=c("education", "occupation"), resp=c("selfesteem","reading"))
> 
> #plot bootstrap histograms for all coefficient estimates
> plot(Sres)
Warning message:
In plot.FRBmultireg(Sres) :
  Number of plots too large to fit on the page, subset was selected: consider specifying (fewer) 
      variables in 'expl' and 'resp'; or enlarge graphics device; or set 'onepage=FALSE'
> #probably the plot-function has made a selection of coefficients to plot here, 
> #since 'all' was too many to  fit on one page, see help(plot.FRBmultireg); 
> #this is platform-dependent
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>