logical: if TRUE, first argument is a distance matrix
method
use original (default) or distance components (discoB, discoF)
R
number of bootstrap replicates
ix
a permutation of the row indices of x
Details
The k-sample multivariate E-test of equal distributions
is performed. The statistic is computed from the original
pooled samples, stacked in matrix x where each row
is a multivariate observation, or the corresponding distance matrix. The
first sizes[1] rows of x are the first sample, the next
sizes[2] rows of x are the second sample, etc.
The test is implemented by nonparametric bootstrap, an approximate
permutation test with R replicates.
The function eqdist.e returns the test statistic only; it simply
passes the arguments through to eqdist.etest with R = 0.
The k-sample multivariate E-statistic for testing equal distributions
is returned. The statistic is computed from the original pooled samples, stacked in
matrix x where each row is a multivariate observation, or from the distance
matrix x of the original data. The
first sizes[1] rows of x are the first sample, the next
sizes[2] rows of x are the second sample, etc.
The two-sample E-statistic proposed by
Szekely and Rizzo (2004)
is the e-distance e(S_i,S_j), defined for two samples S_i, S_j
of size n_i, n_j by
|| || denotes Euclidean norm, and
X_(ip) denotes the p-th observation in the i-th sample.
The original (default method) k-sample
E-statistic is defined by summing the pairwise e-distances over
all k(k-1)/2 pairs
of samples:
emph{E} = sum[i<j] e(S_i,S_j).
Large values of emph{E} are significant.
The discoB method computes the between-sample disco statistic.
For a one-way analysis, it is related to the original statistic as follows.
In the above equation, the weights n_i n_j/(n_i+n_j)
are replaced with
(n_i + n_j)/(2N) n_i n_j/(n_i+n_j) = n_i n_j/(2N)
where N is the total number of observations: N=n_1+...+n_k.
The discoF method is based on the disco F ratio, while the discoB
method is based on the between sample component.
Also see disco and disco.between functions.
Value
A list with class htest containing
method
description of test
statistic
observed value of the test statistic
p.value
approximate p-value of the test
data.name
description of data
eqdist.e returns test statistic only.
Note
The pairwise e-distances between samples can be conveniently
computed by the edist function, which returns a dist object.
Szekely, G. J. and Rizzo, M. L. (2004) Testing for Equal
Distributions in High Dimension, InterStat, November (5).
M. L. Rizzo and G. J. Szekely (2010).
DISCO Analysis: A Nonparametric Extension of
Analysis of Variance, Annals of Applied Statistics,
Vol. 4, No. 2, 1034-1055.
"http://dx.doi.org/10.1214/09-AOAS245"
Szekely, G. J. (2000) Technical Report 03-05:
E-statistics: Energy of
Statistical Samples, Department of Mathematics and Statistics, Bowling
Green State University.
data(iris)
## test if the 3 varieties of iris data (d=4) have equal distributions
eqdist.etest(iris[,1:4], c(50,50,50), R = 199)
## example that uses method="disco"
x <- matrix(rnorm(100), nrow=20)
y <- matrix(rnorm(100), nrow=20)
X <- rbind(x, y)
d <- dist(X)
# should match edist default statistic
set.seed(1234)
eqdist.etest(d, sizes=c(20, 20), distance=TRUE, R = 199)
# comparison with edist
edist(d, sizes=c(20, 10), distance=TRUE)
# for comparison
g <- as.factor(rep(1:2, c(20, 20)))
set.seed(1234)
disco(d, factors=g, distance=TRUE, R=199)
# should match statistic in edist method="discoB", above
set.seed(1234)
disco.between(d, factors=g, distance=TRUE, R=199)