This function combines several independent Anderson-Darling k-sample tests
into one overall test of the hypothesis that the independent samples within
each block come from a common unspecified distribution, while the common
distributions may vary from block to block. Both versions of the
Anderson-Darling test statistic are provided.
Either a sequence of several lists, say L_1, …, L_M (M > 1)
where list L_i contains k_i > 1 sample vectors of respective
sizes n_{i1}, …, n_{ik_i},
where n_{ij} > 4 is recommended
for reasonable asymptotic P-value calculation.
N_i=n_{i1}+…+n_{ik_i}
is the pooled sample size for block i,
or a list of such lists,
or a formula, like y ~ g | b, where y is a numeric response vector,
g is a factor with levels indicating different treatments and
b is a factor indicating different blocks; y, g, b are or equal length.
y is split separately for each block level into separate samples
according to the g levels. The same g level may occur in different blocks.
The variable names may correspond to variables in an optionally supplied
data frame via the data = argument,
data
= an optional data frame providing the variables in formula input
method
= c("asymptotic","simulated","exact"), where
"asymptotic" uses only an asymptotic P-value approximation, reasonable
for P in [0.00001, .99999], linearly extrapolated via
log(P/(1-P)) outside
that range. See ad.pval for details.
The adequacy of the asymptotic P-value calculation may be checked using
pp.kSamples.
"simulated" uses simulation to get Nsim simulated AD statistics
for each block of samples, adding them across blocks component wise to get Nsim
combined values. These are compared with the observed combined value to obtain the
estimated P-value.
"exact" uses full enumeration of the test statistic values
for all sample splits of the pooled samples within each block.
The test statistic vectors for the first 2 blocks are added
(each component against each component, as in the R outer(x,y,"+") command)
to get the convolution enumeration for the combined test statistic. The resulting
vector is convoluted against the next block vector in the same fashion, and so on.
It is possible only for small problems, and is attempted only when Nsim
is at least the (conservatively maximal) length
of the final distribution vector. Otherwise, it reverts to the
simulation method using the provided Nsim.
dist
FALSE (default) or TRUE. If TRUE, the
simulated or fully enumerated convolution vectors
null.dist1 and null.dist2 are returned for the respective
test statistic versions. Otherwise, NULL is returned for each.
Nsim
= 10000 (default), number of simulation splits to use within
each block of samples. It is only used when method = "simulated"
or when method ="exact" reverts to method = "simulated",
as previously explained. Simulations are independent across blocks,
using Nsim for each block. Nsim is limited by 1e7.
Details
If AD_i is the Anderson-Darling criterion for the i-th block of
k_i samples,
its standardized test statistic is
T_i = (AD_i - μ_i)/σ_i, with
μ_i and
σ_i representing mean and standard deviation of
AD_i. This statistic
is used to test the hypothesis that the samples in the i-th block all come
from the same but unspecified continuous distribution function F_i(x).
The combined Anderson-Darling criterion is
AD_{comb}=AD_1 + … + AD_M and
T_{comb} = (AD_{comb} - μ_c)/σ_c is the standardized form,
where μ_c=μ_1+…+μ_M and σ_c =
√{σ_1^2 +…+σ_M^2}
represent the mean and standard deviation of AD_{comb}.
The statistic T_{comb} is used to simultaneously
test whether the samples
in each block come from the same continuous distribution function
F_i(x), i=1,…,M.
The unspecified common distribution function F_i(x) may change
from block to block. According to the reference article, two versions
of the test statistic and its corresponding combinations are provided.
The k_i for each block of k_i
independent samples may change from block to block.
NA values are removed and the user is alerted with the total NA count.
It is up to the user to judge whether the removal of NA's is appropriate.
The continuity assumption can be dispensed with if we deal with
independent random samples, or if randomization was used in allocating
subjects to samples or treatments, independently from block to block, and if we view
the simulated or exact P-values conditionally, given the tie patterns
within each block. Of course, under such randomization any conclusions
are valid only with respect to the blocks of subjects that were randomly allocated.
The asymptotic P-value calculation assumes distribution continuity. No adjustment
for lack thereof is known at this point. The same comment holds for the means
and standard deviations of respective statistics.
Value
A list of class kSamples with components
test.name
="Anderson-Darling"
M
number of blocks of samples being compared
n.samples
list of M vectors, each vector giving the sample sizes for
each block of samples being compared
nt
= (N_1,…,N_M)
n.ties
vector giving the number of ties in each the M
comparison blocks
ad.list
list of M matrices giving the ad results
for ad.test applied to the samples in each of
the M blocks
mu
vector of means of the AD statistic for the M blocks
sig
vector of standard deviations of the AD statistic for the M blocks
ad.c
2 x 3 (2 x 4) matrix containing
AD_{comb}, T_{comb}, asymptotic P-value,
(simulated or exact P-value), for each version of the combined test statistic,
version 1 in row 1 and version 2 in row 2
mu.c
mean of AD_{comb}
sig.c
standard deviation of AD_{comb}
warning
logical indicator, warning = TRUE when at least one
n_{ij} < 5
null.dist1
simulated or enumerated null distribution of version 1
of AD_{comb}
null.dist2
simulated or enumerated null distribution of version 2
of AD_{comb}
method
the method used.
Nsim
the number of simulations used for each block of samples.
Note
This test is useful in analyzing treatment effects in randomized
(incomplete) block experiments and in examining performance
equivalence of several laboratories when presented with different
test materials for comparison.
References
Scholz, F. W. and Stephens, M. A. (1987), K-sample Anderson-Darling Tests,
Journal of the American Statistical Association,
Vol 82, No. 399, 918–924.
See Also
ad.test, ad.pval
Examples
## Create two lists of sample vectors.
x1 <- list( c(1, 3, 2, 5, 7), c(2, 8, 1, 6, 9, 4), c(12, 5, 7, 9, 11) )
x2 <- list( c(51, 43, 31, 53, 21, 75), c(23, 45, 61, 17, 60) )
# and a corresponding data frame datx1x2
x1x2 <- c(unlist(x1),unlist(x2))
gx1x2 <- as.factor(c(rep(1,5),rep(2,6),rep(3,5),rep(1,6),rep(2,5)))
bx1x2 <- as.factor(c(rep(1,16),rep(2,11)))
datx1x2 <- data.frame(A = x1x2, G = gx1x2, B = bx1x2)
## Run ad.test.combined.
set.seed(2627)
ad.test.combined(x1, x2, method = "simulated", Nsim = 1000)
# or with same seed
# ad.test.combined(list(x1, x2), method = "simulated", Nsim = 1000)
# ad.test.combined(A~G|B,data=datx1x2,method="simulated",Nsim=1000)