Tuning parameters for lmrob, the MM-type regression
estimator and the associated S-, M- and D-estimators. Using
setting="KS2011" sets the defaults as suggested by
Koller and Stahel (2011) and analogously for "KS2014".
a string specifying alternative default values. Leave
empty for the defaults or use "KS2011" or "KS2014"
for the defaults suggested by Koller and Stahel (2011, 2014).
See Details.
seed
NULL or an integer vector compatible with
.Random.seed: the seed to be used for random
re-sampling used in obtaining candidates for the initial
S-estimator. The current value of .Random.seed will be
preserved if seed is set, i.e. non-NULL;
otherwise, as by default, .Random.seed will be used and
modified as usual from calls to runif() etc.
nResample
number of re-sampling candidates to be
used to find the initial S-estimator. Currently defaults to 500
which works well in most situations (see references).
tuning.chi
tuning constant vector for the S-estimator. If
NULL, as by default, sensible defaults are set (depending on
psi) to yield a 50% breakdown estimator. See Details.
bb
expected value under the normal model of the
“chi” (rather rho) function with tuning
constant equal to tuning.chi. This is used to compute the
S-estimator.
tuning.psi
tuning constant vector for the redescending
M-estimator. If NULL, as by default, this is set (depending
on psi) to yield an estimator with asymptotic efficiency of
95% for normal errors. See Details.
max.it
integer specifying the maximum number of IRWLS iterations.
groups
(for the fast-S algorithm): Number of
random subsets to use when the data set is large.
n.group
(for the fast-S algorithm): Size of each of the
groups above. Note that this must be at least p.
k.fast.s
(for the fast-S algorithm): Number of
local improvement steps (“I-steps”) for each
re-sampling candidate.
k.m_s
(for the M-S algorithm): specifies after how many
unsucessful refinement steps the algorithm stops.
best.r.s
(for the fast-S algorithm): Number of
of best candidates to be iterated further (i.e.,
“refined”); is denoted t in
Salibian-Barrera & Yohai(2006).
k.max
(for the fast-S algorithm): maximal number of
refinement steps for the “fully” iterated best candidates.
maxit.scale
integer specifying the maximum number of C level
find_scale() iterations.
refine.tol
(for the fast-S algorithm): relative convergence
tolerance for the fully iterated best candidates.
rel.tol
(for the RWLS iterations of the MM algorithm): relative
convergence tolerance for the parameter vector.
solve.tol
(for the S algorithm): relative
tolerance for inversion. Hence, this corresponds to
solve.default()'s tol.
trace.lev
integer indicating if the progress of the MM-algorithm
should be traced (increasingly); default trace.lev = 0 does
no tracing.
mts
maximum number of samples to try in subsampling
algorithm.
subsampling
type of subsampling to be used, a string:
"simple" for simple subsampling (default prior to version 0.9),
"nonsingular" for nonsingular subsampling. See also
lmrob.S.
compute.rd
logical indicating if robust distances (based on
the MCD robust covariance estimator covMcd) are to be
computed for the robust diagnostic plots. This may take some
time to finish, particularly for large data sets, and can lead to
singularity problems when there are factor explanatory
variables (with many levels, or levels with “few”
observations). Hence, is FALSE by default.
method
string specifying the estimator-chain. MM
is interpreted as SM. See Details of
lmrob for a description of the possible values.
psi
string specifying the type ψ-function
used. See Details of lmrob. Defaults to
"bisquare" for S and MM-estimates, otherwise "lqq".
numpoints
number of points used in Gauss quadrature.
cov
function or string with function name to be used to
calculate covariance matrix estimate. The default is
if(method %in% c('SM', 'MM')) ".vcov.avar1" else ".vcov.w".
See Details of lmrob.
split.type
determines how categorical and continuous variables
are split. See splitFrame.
fast.s.large.n
minimum number of observations required to
switch from ordinary “fast S” algorithm to an efficient
“large n” strategy.
eps.outlier
limit on the robustness weight below which an observation
is considered to be an outlier.
Either a numeric(1) or a function that takes the number of observations as
an argument. Used in summary.lmrob and
outlierStats.
eps.x
limit on the absolute value of the elements of the design matrix
below which an element is considered zero.
Either a numeric(1) or a function that takes the maximum absolute value in
the design matrix as an argument.
compute.outlier.stats
vector of character
strings, each valid to be used as method argument. Used to
specify for which estimators outlier statistics (and warnings)
should be produced. Set to empty string if none are required.
warn.limit.reject
limit of ratio
# rejected / # obs in level
above (>=) which a warning is produced.
Set to NULL to disable warning.
warn.limit.meanrw
limit of the mean robustness per factor level
below which (<=) a warning is produced.
Set to NULL to disable warning.
...
further arguments to be added as list
components to the result, e.g., those to be used in .vcov.w().
Details
The option setting="KS2011" alters the default
arguments. They are changed to method = 'SMDM', psi = 'lqq',
max.it = 500, k.max = 2000, cov = '.vcov.w'.
The defaults of all the remaining arguments are not changed.
The option setting="KS2014" builds upon setting="KS2011".
More arguments are changed to best.r.s = 20, k.fast.s = 2,
nResample = 1000. This setting should produce more stable estimates
for designs with factors.
By default, and in .Mpsi.tuning.default() and .Mchi.tuning.default(),
tuning.chi and tuning.psi are set to yield an
MM-estimate with break-down point 0.5 and efficiency of 95% at
the normal.
To get these defaults, e.g., .Mpsi.tuning.default(psi) is
equivalent to but more efficient than the formerly widely used
lmrob.control(psi = psi)$tuning.psi.
These defaults are:
psi
tuning.chi
tuning.psi
bisquare
1.54764
4.685061
welsh
0.5773502
2.11
ggw
c(-0.5, 1.5, NA, 0.5)
c(-0.5, 1.5, 0.95, NA)
lqq
c(-0.5, 1.5, NA, 0.5)
c(-0.5, 1.5, 0.95, NA)
optimal
0.4047
1.060158
hampel
c(1.5, 3.5, 8)*0.2119163
c(1.5, 3.5, 8)*0.9014
The values for the tuning constant for the ggw psi function are
hard coded. The constants vector has four elements: minimal slope, b
(controlling the bend at the maximum of the curve), efficiency,
break-down point. Use NA for an unspecified value, see examples
in the tables.
The constants for the "hampel" psi function are chosen to have a
redescending slope of -1/3. Constants for a slope of -1/2
would be
psi
tuning.chi
tuning.psi
"hampel"
c(2, 4, 8) * 0.1981319
c(2, 4, 8) * 0.690794
Alternative coefficients for an efficiency of 85%
at the normal are given in the table below.
psi
tuning.psi
bisquare
3.443689
welsh
1.456
ggw, lqq
c(-0.5, 1.5, 0.85, NA)
optimal
0.8684
hampel (-1/3)
c(1.5, 3.5, 8)* 0.5704545
hampel (-1/2)
c( 2, 4, 8) * 0.4769578
Value
.Mchi.tuning.default(psi) and .Mpsi.tuning.default(psi)
return a short numeric vector of tuning constants which
are defaults for the corresponding psi-function, see the Details.
They are based on the named lists
.Mchi.tuning.defaults and .Mpsi.tuning.defaults,
respectively.
lmrob.control() returns a named list with over
twenty components, corresponding to the arguments, where
tuning.psi and tuning.chi are typically computed, as
.Mpsi.tuning.default(psi) or .Mchi.tuning.default(psi),
respectively.
Author(s)
Matias Salibian-Barrera, Martin Maechler and Manuel Koller
References
Koller, M. and Stahel, W.A. (2011)
Sharpening Wald-type inference in robust regression for small samples.
Computational Statistics & Data Analysis55(8), 2504–2515.
Koller, M. and Stahel, W.A. (2014)
Nonsingular subsampling for regression S~estimators with categorical
predictors. Under review.
See Also
lmrob, also for references and examples.
Examples
## Show the default settings:
str(lmrob.control())
## Artificial data for a simple "robust t test":
set.seed(17)
y <- y0 <- rnorm(200)
y[sample(200,20)] <- 100*rnorm(20)
gr <- as.factor(rbinom(200, 1, prob = 1/8))
lmrob(y0 ~ 0+gr)
## Use Koller & Stahel(2011)'s recommendation but a larger 'max.it':
str(ctrl <- lmrob.control("KS2011", max.it = 1000))
str(.Mpsi.tuning.defaults)
stopifnot(identical(.Mpsi.tuning.defaults,
sapply(names(.Mpsi.tuning.defaults),
.Mpsi.tuning.default)))