Computes distance covariance and distance correlation statistics,
which are multivariate measures of dependence.
Usage
dcov(x, y, index = 1.0)
dcor(x, y, index = 1.0)
DCOR(x, y, index = 1.0)
Arguments
x
data or distances of first sample
y
data or distances of second sample
index
exponent on Euclidean distance, in (0,2]
Details
dcov and dcor or DCOR compute distance
covariance and distance correlation statistics.
DCOR is a self-contained R function returning a list of
statistics. dcor execution is faster than DCOR
(see examples).
The sample sizes (number of rows) of the two samples must
agree, and samples must not contain missing values. Arguments
x, y can optionally be dist objects;
otherwise these arguments are treated as data.
Distance correlation is a new measure of dependence between random
vectors introduced by Szekely, Rizzo, and Bakirov (2007).
For all distributions with finite first moments, distance
correlation R generalizes the idea of correlation in two
fundamental ways:
(1) R(X,Y) is defined for X and Y in arbitrary dimension.
(2) R(X,Y)=0 characterizes independence of X and
Y.
Distance correlation satisfies 0 ≤ R ≤ 1, and
R = 0 only if X and Y are independent. Distance
covariance V provides a new approach to the problem of
testing the joint independence of random vectors. The formal
definitions of the population coefficients V and
R are given in (SRB 2007). The definitions of the
empirical coefficients are as follows.
The empirical distance covariance V_n(X,Y)
with index 1 is
the nonnegative number defined by
See dcov.test for a test of multivariate independence
based on the distance covariance statistic.
Value
dcov returns the sample distance covariance and
dcor returns the sample distance correlation.
DCOR returns a list with elements
dCov
sample distance covariance
dCor
sample distance correlation
dVarX
distance variance of x sample
dVarY
distance variance of y sample
Note
Two methods of computing the statistics are provided. DCOR
is a stand-alone R function that returns a list of statistics.
dcov and dcor provide R interfaces to the C
implementation, which is usually faster. dcov and dcor
call an internal function .dcov.
Note that it is inefficient to compute dCor by:
square root of
dcov(x,y)/sqrt(dcov(x,x)*dcov(y,y))
because the individual
calls to dcov involve unnecessary repetition of calculations.
For this reason, both .dcov and DCOR compute and
return all four statistics.
Szekely, G.J., Rizzo, M.L., and Bakirov, N.K. (2007),
Measuring and Testing Dependence by Correlation of Distances,
Annals of Statistics, Vol. 35 No. 6, pp. 2769-2794.
http://dx.doi.org/10.1214/009053607000000505
Szekely, G.J. and Rizzo, M.L. (2009),
Brownian Distance Covariance,
Annals of Applied Statistics,
Vol. 3, No. 4, 1236-1265.
http://dx.doi.org/10.1214/09-AOAS312
Szekely, G.J. and Rizzo, M.L. (2009),
Rejoinder: Brownian Distance Covariance,
Annals of Applied Statistics, Vol. 3, No. 4, 1303-1308.
See Also
dcov.testdcor.ttest
Examples
x <- iris[1:50, 1:4]
y <- iris[51:100, 1:4]
dcov(x, y)
dcov(dist(x), dist(y)) #same thing
## C implementation
dcov(x, y, 1.5)
dcor(x, y, 1.5)
.dcov(dist(x), dist(y), 1.5)
## R implementation
DCOR(x, y, 1.5)
## Not run:
## compare speed of R version and C version
set.seed(111)
## R version
system.time(replicate(1000, DCOR(x, y)))
set.seed(111)
## C version
system.time(replicate(1000, .dcov(x, y)))
## End(Not run)