the correlation shrinkage intensity (range 0-1).
If lambda is not specified (the default) it is estimated
using an analytic formula from Sch"afer and Strimmer (2005)
- see details below.
For lambda=0 the empirical correlations are recovered.
lambda.var
the variance shrinkage intensity (range 0-1).
If lambda.var is not specified (the default) it is estimated
using an analytic formula from Opgen-Rhein and Strimmer (2007)
- see details below.
For lambda.var=0 the empirical variances are recovered.
w
optional: weights for each data point - if not specified uniform weights are assumed
(w = rep(1/n, n) with n = nrow(x)).
verbose
output some status messages while computing (default: TRUE)
Details
var.shrink computes the empirical variance of each considered random variable,
and shrinks them towards their median. The shrinkage intensity is estimated using
estimate.lambda.var (Opgen-Rhein and Strimmer 2007).
Similarly cor.shrink computes a shrinkage estimate of the correlation matrix by
shrinking the empirical correlations towards the identity matrix.
In this case the shrinkage intensity is computed using estimate.lambda
(Sch"afer and Strimmer 2005).
In comparison with the standard empirical estimates
(var, cov, and cor) the shrinkage estimates exhibit
a number of favorable properties. For instance,
they are typically much more efficient, i.e. they show (sometimes dramatically) better
mean squared error,
the estimated covariance and correlation matrices are always positive definite
and well conditioned (so that there are no numerical problems when computing their inverse),
they are inexpensive to compute, and
they are fully automatic and do not require any
tuning parameters (as the shrinkage intensity is analytically estimated from the data), and
they assume nothing about the underlying distributions, except for the existence of
the first two moments.
These properties also carry over to derived quantities, such as partial variances and
partial correlations (pvar.shrink and pcor.shrink).
As an extra benefit, the shrinkage estimators have a form that can be very efficiently inverted,
especially if the number of variables is large and the sample size is small. Thus, instead of
inverting the matrix output by cov.shrink and cor.shrink please use the functions
invcov.shrink and invcor.shrink, respectively.
Value
var.shrink returns a vector with estimated variances.
cov.shrink returns a covariance matrix.
cor.shrink returns the corresponding correlation matrix.
Opgen-Rhein, R., and K. Strimmer. 2007. Accurate ranking of
differentially expressed genes by a distribution-free shrinkage
approach. Statist. Appl. Genet. Mol. Biol. 6:9.
(http://www.bepress.com/sagmb/vol6/iss1/art9/)
Sch"afer, J., and K. Strimmer. 2005. A shrinkage approach to large-scale
covariance estimation and implications for functional genomics.
Statist. Appl. Genet. Mol. Biol. 4:32.
(http://www.bepress.com/sagmb/vol4/iss1/art32/)
See Also
invcov.shrink, pcor.shrink, cor2pcor
Examples
# load corpcor library
library("corpcor")
# small n, large p
p = 100
n = 20
# generate random pxp covariance matrix
sigma = matrix(rnorm(p*p),ncol=p)
sigma = crossprod(sigma)+ diag(rep(0.1, p))
# simulate multinormal data of sample size n
sigsvd = svd(sigma)
Y = t(sigsvd$v %*% (t(sigsvd$u) * sqrt(sigsvd$d)))
X = matrix(rnorm(n * ncol(sigma)), nrow = n) %*% Y
# estimate covariance matrix
s1 = cov(X)
s2 = cov.shrink(X)
# squared error
sum((s1-sigma)^2)
sum((s2-sigma)^2)
# compare positive definiteness
is.positive.definite(sigma)
is.positive.definite(s1)
is.positive.definite(s2)
# compare ranks and condition
rank.condition(sigma)
rank.condition(s1)
rank.condition(s2)
# compare eigenvalues
e0 = eigen(sigma, symmetric=TRUE)$values
e1 = eigen(s1, symmetric=TRUE)$values
e2 = eigen(s2, symmetric=TRUE)$values
m = max(e0, e1, e2)
yl = c(0, m)
par(mfrow=c(1,3))
plot(e1, main="empirical")
plot(e2, ylim=yl, main="full shrinkage")
plot(e0, ylim=yl, main="true")
par(mfrow=c(1,1))