R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Tuning Parameters for lmrob() and Auxiliaries

lmrob.control

R Documentation

Tuning Parameters for lmrob() and Auxiliaries

Description

Tuning parameters for lmrob, the MM-type regression estimator and the associated S-, M- and D-estimators. Using setting="KS2011" sets the defaults as suggested by Koller and Stahel (2011) and analogously for "KS2014".

Usage

lmrob.control(setting, seed = NULL, nResample = 500,
              tuning.chi = NULL, bb = 0.5, tuning.psi = NULL,
              max.it = 50, groups = 5, n.group = 400,
              k.fast.s = 1, best.r.s = 2,
              k.max = 200, maxit.scale = 200, k.m_s = 20,
              refine.tol = 1e-7, rel.tol = 1e-7, solve.tol = 1e-7,
              trace.lev = 0,
              mts = 1000, subsampling = c("nonsingular", "simple"),
              compute.rd = FALSE, method = "MM", psi = "bisquare",
              numpoints = 10, cov = NULL,
              split.type = c("f", "fi", "fii"), fast.s.large.n = 2000,
	      eps.outlier = function(nobs) 0.1 / nobs,
              eps.x = function(maxx) .Machine$double.eps^(.75)*maxx,
              compute.outlier.stats = method,
              warn.limit.reject = 0.5,
              warn.limit.meanrw = 0.5, ...)

.Mchi.tuning.defaults
.Mchi.tuning.default(psi)
.Mpsi.tuning.defaults
.Mpsi.tuning.default(psi)

Arguments

`setting`	a string specifying alternative default values. Leave empty for the defaults or use `"KS2011"` or `"KS2014"` for the defaults suggested by Koller and Stahel (2011, 2014). See Details.
`seed`	`NULL` or an integer vector compatible with `.Random.seed`: the seed to be used for random re-sampling used in obtaining candidates for the initial S-estimator. The current value of `.Random.seed` will be preserved if `seed` is set, i.e. non-`NULL`; otherwise, as by default, `.Random.seed` will be used and modified as usual from calls to `runif()` etc.
`nResample`	number of re-sampling candidates to be used to find the initial S-estimator. Currently defaults to 500 which works well in most situations (see references).
`tuning.chi`	tuning constant vector for the S-estimator. If `NULL`, as by default, sensible defaults are set (depending on `psi`) to yield a 50% breakdown estimator. See Details.
`bb`	expected value under the normal model of the “chi” (rather rho) function with tuning constant equal to `tuning.chi`. This is used to compute the S-estimator.
`tuning.psi`	tuning constant vector for the redescending M-estimator. If `NULL`, as by default, this is set (depending on `psi`) to yield an estimator with asymptotic efficiency of 95% for normal errors. See Details.
`max.it`	integer specifying the maximum number of IRWLS iterations.
`groups`	(for the fast-S algorithm): Number of random subsets to use when the data set is large.
`n.group`	(for the fast-S algorithm): Size of each of the `groups` above. Note that this must be at least p.
`k.fast.s`	(for the fast-S algorithm): Number of local improvement steps (“I-steps”) for each re-sampling candidate.
`k.m_s`	(for the M-S algorithm): specifies after how many unsucessful refinement steps the algorithm stops.
`best.r.s`	(for the fast-S algorithm): Number of of best candidates to be iterated further (i.e., “refined”); is denoted t in Salibian-Barrera & Yohai(2006).
`k.max`	(for the fast-S algorithm): maximal number of refinement steps for the “fully” iterated best candidates.
`maxit.scale`	integer specifying the maximum number of C level `find_scale()` iterations.
`refine.tol`	(for the fast-S algorithm): relative convergence tolerance for the fully iterated best candidates.
`rel.tol`	(for the RWLS iterations of the MM algorithm): relative convergence tolerance for the parameter vector.
`solve.tol`	(for the S algorithm): relative tolerance for inversion. Hence, this corresponds to `solve.default()`'s `tol`.
`trace.lev`	integer indicating if the progress of the MM-algorithm should be traced (increasingly); default `trace.lev = 0` does no tracing.
`mts`	maximum number of samples to try in subsampling algorithm.
`subsampling`	type of subsampling to be used, a string: `"simple"` for simple subsampling (default prior to version 0.9), `"nonsingular"` for nonsingular subsampling. See also `lmrob.S`.
`compute.rd`	logical indicating if robust distances (based on the MCD robust covariance estimator `covMcd`) are to be computed for the robust diagnostic plots. This may take some time to finish, particularly for large data sets, and can lead to singularity problems when there are `factor` explanatory variables (with many levels, or levels with “few” observations). Hence, is `FALSE` by default.
`method`	string specifying the estimator-chain. `MM` is interpreted as `SM`. See Details of `lmrob` for a description of the possible values.
`psi`	string specifying the type ψ-function used. See Details of `lmrob`. Defaults to `"bisquare"` for S and MM-estimates, otherwise `"lqq"`.
`numpoints`	number of points used in Gauss quadrature.
`cov`	function or string with function name to be used to calculate covariance matrix estimate. The default is `if(method %in% c('SM', 'MM')) ".vcov.avar1" else ".vcov.w"`. See Details of `lmrob`.
`split.type`	determines how categorical and continuous variables are split. See `splitFrame`.
`fast.s.large.n`	minimum number of observations required to switch from ordinary “fast S” algorithm to an efficient “large n” strategy.
`eps.outlier`	limit on the robustness weight below which an observation is considered to be an outlier. Either a numeric(1) or a function that takes the number of observations as an argument. Used in `summary.lmrob` and `outlierStats`.
`eps.x`	limit on the absolute value of the elements of the design matrix below which an element is considered zero. Either a numeric(1) or a function that takes the maximum absolute value in the design matrix as an argument.
`compute.outlier.stats`	vector of `character` strings, each valid to be used as `method` argument. Used to specify for which estimators outlier statistics (and warnings) should be produced. Set to empty string if none are required.
`warn.limit.reject`	limit of ratio # rejected / # obs in level above (>=) which a warning is produced. Set to `NULL` to disable warning.
`warn.limit.meanrw`	limit of the mean robustness per factor level below which (<=) a warning is produced. Set to `NULL` to disable warning.
`...`	further arguments to be added as `list` components to the result, e.g., those to be used in `.vcov.w()`.

Details

The option setting="KS2011" alters the default arguments. They are changed to method = 'SMDM', psi = 'lqq', max.it = 500, k.max = 2000, cov = '.vcov.w'. The defaults of all the remaining arguments are not changed.

The option setting="KS2014" builds upon setting="KS2011". More arguments are changed to best.r.s = 20, k.fast.s = 2, nResample = 1000. This setting should produce more stable estimates for designs with factors.

By default, and in .Mpsi.tuning.default() and .Mchi.tuning.default(), tuning.chi and tuning.psi are set to yield an MM-estimate with break-down point 0.5 and efficiency of 95% at the normal.

To get these defaults, e.g., .Mpsi.tuning.default(psi) is equivalent to but more efficient than the formerly widely used lmrob.control(psi = psi)$tuning.psi.

These defaults are:

`psi`	`tuning.chi`	`tuning.psi`
`bisquare`	`1.54764`	`4.685061`
`welsh`	`0.5773502`	`2.11`
`ggw`	`c(-0.5, 1.5, NA, 0.5)`	`c(-0.5, 1.5, 0.95, NA)`
`lqq`	`c(-0.5, 1.5, NA, 0.5)`	`c(-0.5, 1.5, 0.95, NA)`
`optimal`	`0.4047`	`1.060158`
`hampel`	`c(1.5, 3.5, 8)*0.2119163`	`c(1.5, 3.5, 8)*0.9014`

The values for the tuning constant for the ggw psi function are hard coded. The constants vector has four elements: minimal slope, b (controlling the bend at the maximum of the curve), efficiency, break-down point. Use NA for an unspecified value, see examples in the tables.

The constants for the "hampel" psi function are chosen to have a redescending slope of -1/3. Constants for a slope of -1/2 would be

`psi`	`tuning.chi`	`tuning.psi`
`"hampel"`	`c(2, 4, 8) * 0.1981319`	`c(2, 4, 8) * 0.690794`

Alternative coefficients for an efficiency of 85% at the normal are given in the table below.

`psi`	`tuning.psi`
`bisquare`	`3.443689`
`welsh`	`1.456`
`ggw`, `lqq`	`c(-0.5, 1.5, 0.85, NA)`
`optimal`	`0.8684`
`hampel` (-1/3)	`c(1.5, 3.5, 8)* 0.5704545`
`hampel` (-1/2)	`c( 2, 4, 8) * 0.4769578`

Value

.Mchi.tuning.default(psi) and .Mpsi.tuning.default(psi) return a short numeric vector of tuning constants which are defaults for the corresponding psi-function, see the Details. They are based on the named lists .Mchi.tuning.defaults and .Mpsi.tuning.defaults, respectively.

lmrob.control() returns a named list with over twenty components, corresponding to the arguments, where tuning.psi and tuning.chi are typically computed, as .Mpsi.tuning.default(psi) or .Mchi.tuning.default(psi), respectively.

Author(s)

Matias Salibian-Barrera, Martin Maechler and Manuel Koller

References

Koller, M. and Stahel, W.A. (2011) Sharpening Wald-type inference in robust regression for small samples. Computational Statistics & Data Analysis 55(8), 2504–2515.

Koller, M. and Stahel, W.A. (2014) Nonsingular subsampling for regression S~estimators with categorical predictors. Under review.

Examples

## Show the default settings:
str(lmrob.control())

## Artificial data for a  simple  "robust t test":
set.seed(17)
y <- y0 <- rnorm(200)
y[sample(200,20)] <- 100*rnorm(20)
gr <- as.factor(rbinom(200, 1, prob = 1/8))
lmrob(y0 ~ 0+gr)

## Use  Koller & Stahel(2011)'s recommendation but a larger  'max.it':
str(ctrl <- lmrob.control("KS2011", max.it = 1000))

str(.Mpsi.tuning.defaults)
stopifnot(identical(.Mpsi.tuning.defaults,
                   sapply(names(.Mpsi.tuning.defaults),
                          .Mpsi.tuning.default)))