R: High Breakdown and High Efficiency Robust Linear Regression
lmRob
R Documentation
High Breakdown and High Efficiency Robust Linear Regression
Description
Performs a robust linear regression with high breakdown point and high
efficiency regression.
Usage
lmRob(formula, data, weights, subset, na.action,
model = TRUE, x = FALSE, y = FALSE, contrasts = NULL,
nrep = NULL, control = lmRob.control(...), ...)
Arguments
formula
a formula object, with the response on the left of a ~ operator, and the terms, separated by + operators, on the right.
data
a data.frame in which to interpret the variables named in the formula, or in the subset and the weights argument. If this is missing, then the variables in the formula should be on
the search list. This may also be a single number to handle some special cases - see below for details.
weights
vector of observation weights; if supplied, the algorithm fits to minimize the sum of a function of the square root of the weights multiplied into the residuals. The length of weights must be the same as
the number of observations. The weights must be nonnegative and it is strongly recommended that they be strictly positive, since zero weights are ambiguous, compared to use of the subset argument.
subset
expression saying which subset of the rows of the data should be used in the fit. This can be a logical vector (which is replicated to have length equal to the number of observations), or a numeric vector indicating which observation numbers are to be included, or a character vector of the row names to be included. All observations are included by default.
na.action
a function to filter missing data. This is applied to the model.frame after any subset argument has been used. The default (with na.fail) is to create an error if any missing values are found. A possible alternative is na.exclude, which deletes observations that contain one or more missing values.
model
a logical flag: if TRUE, the model frame is returned in component model.
x
a logical flag: if TRUE, the model matrix is returned in component x.
y
a logical flag: if TRUE, the response is returned in component y.
contrasts
a list giving contrasts for some or all of the factors appearing in the model formula. The elements of the list should have the same name as the variable and should be either a contrast matrix (specifically, any full-rank matrix with as many rows as there are levels in the factor), or else a function to compute such a matrix given the number of levels.
nrep
the number of random subsamples to be drawn. If "Exhaustive" resampling is being used, the value of nrep is ignored.
control
a list of control parameters to be used in the numerical algorithms. See lmRob.control for the possible control parameters and their default settings.
...
additional arguments are passed to the ccontrol functions.
Details
By default, the lmRob function automatically chooses an appropriate algorithm to compute a final robust estimate with high breakdown point and high efficiency. The final robust estimate is computed based on an initial estimate with high breakdown point. For the initial estimation, the alternate M-S estimate is used if there are any factor variables in the predictor matrix, and an S-estimate is used otherwise. To compute the S-estimate, a random resampling or a fast procedure is used unless the data set is small, in which case exhaustive resampling is employed. See lmRob.control for how to choose between the different algorithms.
Value
a list describing the regression. Note that the solution returned here is an approximation to the true solution based upon a random algorithm (except when "Exhaustive" resampling is chosen). Hence you will get (slightly) different answers each time if you make the same call with a different seed. See lmRob.control for how to set the seed, and see lmRob.object for a complete description of the object returned.
References
Gervini, D., and Yohai, V. J. (1999).
A class of robust and fully efficient regression estimates;
mimeo, Universidad de Buenos Aires.
Marazzi, A. (1993).
Algorithms, routines, and S functions for robust statistics.
Wadsworth & Brooks/Cole, Pacific Grove, CA.
Maronna, R. A., and Yohai, V. J. (2000).
Robust regression with both continuous and categorical predictors.
Journal of Statistical Planning and Inference89, 197–214.
Pena, D., and Yohai, V. (1999).
A Fast Procedure for Outlier Diagnostics in Large Regression Problems.
Journal of the American Statistical Association94, 434–445.
Yohai, V. (1988).
High breakdown-point and high efficiency estimates for regression.
Annals of Statistics15, 642–665.
Yohai, V., Stahel, W. A., and Zamar, R. H. (1991).
A procedure for robust estimation and inference in linear regression; in
Stahel, W. A. and Weisberg, S. W., Eds.,
Directions in robust statistics and diagnostics, Part II.
Springer-Verlag.
See Also
lmRob.control,
lmRob.object.
Examples
data(stack.dat)
stack.rob <- lmRob(Loss ~ ., data = stack.dat)