R: Kruskal-Wallis Test for the 2 x t Contingency Table
contingency2xt
R Documentation
Kruskal-Wallis Test for the 2 x t Contingency Table
Description
This function uses the Kruskal-Wallis criterion to test
the hypothesis of no association between the counts
for two responses
"A" and "B" across t categories.
vector of length t giving the counts A_1,…, A_t
for response "A" according to t categories.
m = A_1 + … + A_t.
Bvec
vector of length t giving the counts B_1,…, B_t
for response "B" according to t categories.
n = B_1 + … + B_t = N-m.
method
= c("asymptotic","simulated","exact"), where
"asymptotic" uses only an asymptotic chi-square approximation
with t-1 degrees of freedom to approximate the P-value.
This calculation is always done.
"simulated" uses Nsim simulated counts for Avec and
Bvec with the observed marginal totals, m, n, d = Avec+Bvec,
to estimate the P-value.
"exact" enumerates all counts for Avec and Bvec with
the observed marginal totals to get an exact P-value. It is used only
when Nsim is at least as large as the number choose(m+t-1,t-1)
of full enumerations.
Otherwise, method reverts to "simulated" using the given Nsim.
dist
FALSE (default) or TRUE. If dist = TRUE, the distribution of the
simulated or fully enumerated Kruskal-Wallis test statistics is
returned as null.dist, if dist = FALSE the value
of null.dist is NULL.
The coice dist = TRUE also limits Nsim <- min(Nsim,1e8).
tab0
TRUE (default) or FALSE. If tab0 = TRUE, the null distribution
is returned in 2 column matrix form when
method = "simulated". When tab0 = FALSE the simulated null distribution
is returned as a vector of all simulated values of the test statistic.
Nsim
=10000 (default), number of simulated Avec splits to use.
It is only used when method = "simulated",
or when method = "exact" reverts to method ="simulated", as previously explained.
Details
For this data scenario the Kruskal-Wallis criterion is
K.star =
N(N-1)/(mn) (∑ A_i^2/d_i-m^2/N)
with d_i=A_i+B_i, treating "A" responses
as 1 and "B" responses as 2, and using midranks as explained in Lehmann (2006), Chapter 5.3.
For small sample sizes exact null distribution
calculations are possible, based on Algorithm C (Chase's sequence) in Knuth (2011),
which allows the enumeration of all possible splits of m into counts
A_1,…, A_t such that
m = A_1 + … + A_t,
followed by the calculation of the statistic
K.star for each such split.
Simulation of A_1,…, A_t uses the probability model (5.35) in Lehmann (2006)
to successively generate hypergeometric counts A_1,…, A_t.
Both these processes, enumeration and simulation, are done in C.
Value
A list of class kSamples with components
test.name
"2 x t Contingency Table"
t
number of classification categories
KW.cont
2 (3) vector giving the observed KW statistic, its asymptotic
P-value (and simulated or exact P-value)
null.dist
simulated or enumerated null distribution
of the test statistic. It is given as an M by 2 matrix,
where the first column (named KW) gives the M unique ordered
values of the Kruskal-Wallis
statistic and the second column (named prob) gives the corresponding (simulated or exact)
probabilities.
This format of null.dist is returned when method = "exact"
and dist= TRUE or when method = "simulated"
and dist = TRUE and tab0= TRUE are specified.
For method ="simulated", dist = TRUE, and
tab0 = FALSE the null distribution null.dist is returned as the vector of
all simulated test statistic values. This is used in contingency2xt.comb
in the simulation mode.
null.dist = NULL is returned
when dist = FALSE or when method ="asymptotic".
method
the method used.
Nsim
the number of simulations.
warning
method = "exact" should only be used with caution.
Computation time is proportional to the number of enumerations. In most cases
dist = TRUE should not be used, i.e.,
when the returned distribution objects
become too large for R's work space.
References
Knuth, D.E. (2011), The Art of Computer Programming, Volume 4A
Combinatorial Algorithms Part 1, Addison-Wesley
Kruskal, W.H. (1952), A Nonparametric Test for the Several Sample Problem,
The Annals of Mathematical Statistics,
Vol 23, No. 4, 525-540
Kruskal, W.H. and Wallis, W.A. (1952), Use of Ranks in One-Criterion Variance Analysis,
Journal of the American Statistical Association,
Vol 47, No. 260, 583–621.
Lehmann, E.L. (2006), Nonparametrics, Statistical Methods Based on Ranks,
Revised First Edition,
Springer, New York.