a positive integer giving the number of resamples required;
nsamp may not be reached if too many of the p-subsamples,
chosen out of the observed vectors, are in a hyperplane.
If nsamp = 0 all possible subsamples are taken.
If nsamp is omitted, it is calculated to provide a breakdown point
of eps with probability prob.
maxres
a positive integer specifying the maximum number of
resamples to be performed including those that are discarded due to linearly
dependent subsamples. If maxres is omitted it will be set to 2 times nsamp.
tune
a numeric value between 0 and 1 giving the fraction of the data to receive non-zero weight.
Defaults to 0.95
prob
a numeric value between 0 and 1 specifying the probability of high breakdown point;
used to compute nsamp when nsamp is omitted. Defaults to 0.99.
eps
a numeric value between 0 and 0.5 specifying the breakdown point; used to compute
nsamp when nresamp is omitted. Defaults to 0.5.
seed
starting value for random generator. Default is seed = NULL.
trace
whether to print intermediate results. Default is trace = FALSE.
control
a control object (S4) of class CovControlSde-class
containing estimation options - same as these provided in the fucntion
specification. If the control object is supplied, the parameters from it
will be used. If parameters are passed also in the invocation statement, they will
override the corresponding elements of the control object.
Details
The projection based Stahel-Donoho estimator posses very good statistical properties,
but it can be very slow if the number of variables is too large. It is recommended to use
this estimator if n <= 1000 and p<=10 or n <= 5000 and p<=5.
The number of subsamples required is calculated to provide a breakdown point of
eps with probability prob and can reach values larger than
the larger integer value - in such case it is limited to .Machine$integer.max.
Of course you could provide nsamp in the call, i.e. nsamp=1000 but
this will not guarantee the required breakdown point of th eestimator.
For larger data sets it is better to use CovMcd or CovOgk.
If you use CovRobust, the estimator will be selected automatically
according on the size of the data set.
Value
An S4 object of class CovSde-class which is a subclass of the
virtual class CovRobust-class.
Note
The Fortran code for the Stahel-Donoho method was taken almost with no changes from
package robust which in turn has it from the Insightful Robust Library
(thanks to by Kjell Konis).
R. A. Maronna and V.J. Yohai (1995) The Behavior of the Stahel-Donoho Robust Multivariate
Estimator. Journal of the American Statistical Association90 (429), 330–341.
R. A. Maronna, D. Martin and V. Yohai (2006). Robust Statistics: Theory and Methods.
Wiley, New York.
Todorov V & Filzmoser P (2009),
An Object Oriented Framework for Robust Multivariate Analysis.
Journal of Statistical Software, 32(3), 1–47.
URL http://www.jstatsoft.org/v32/i03/.
Examples
data(hbk)
hbk.x <- data.matrix(hbk[, 1:3])
CovSde(hbk.x)
## the following four statements are equivalent
c0 <- CovSde(hbk.x)
c1 <- CovSde(hbk.x, nsamp=2000)
c2 <- CovSde(hbk.x, control = CovControlSde(nsamp=2000))
c3 <- CovSde(hbk.x, control = new("CovControlSde", nsamp=2000))
## direct specification overrides control one:
c4 <- CovSde(hbk.x, nsamp=100,
control = CovControlSde(nsamp=2000))
c1
summary(c1)
plot(c1)
## Use the function CovRobust() - if no estimation method is
## specified, for small data sets CovSde() will be called
cr <- CovRobust(hbk.x)
cr