R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Tests of Independence in Two- or Three-Way Contingency Tables

ContingencyTests

R Documentation

Tests of Independence in Two- or Three-Way Contingency Tables

Description

Testing the independence of two nominal or ordered factors.

Usage

## S3 method for class 'formula'
chisq_test(formula, data, subset = NULL, weights = NULL, ...)
## S3 method for class 'table'
chisq_test(object, ...)
## S3 method for class 'IndependenceProblem'
chisq_test(object, ...)

## S3 method for class 'formula'
cmh_test(formula, data, subset = NULL, weights = NULL, ...)
## S3 method for class 'table'
cmh_test(object, ...)
## S3 method for class 'IndependenceProblem'
cmh_test(object, ...)

## S3 method for class 'formula'
lbl_test(formula, data, subset = NULL, weights = NULL, ...)
## S3 method for class 'table'
lbl_test(object, ...)
## S3 method for class 'IndependenceProblem'
lbl_test(object, distribution = c("asymptotic", "approximate", "none"), ...)

Arguments

`formula`	a formula of the form `y ~ x \| block` where `y` and `x` are factors and `block` is an optional factor for stratification.
`data`	an optional data frame containing the variables in the model formula.
`subset`	an optional vector specifying a subset of observations to be used. Defaults to `NULL`.
`weights`	an optional formula of the form `~ w` defining integer valued case weights for each observation. Defaults to `NULL`, implying equal weight for all observations.
`object`	an object inheriting from classes `"table"` or `"IndependenceProblem"`.
`distribution`	a character, the conditional null distribution of the test statistic can be approximated by its asymptotic distribution (`"asymptotic"`, default) or via Monte Carlo resampling (`"approximate"`). Alternatively, the functions `asymptotic` or `approximate` can be used. Computation of the null distribution can be suppressed by specifying `"none"`.
`...`	further arguments to be passed to `independence_test`.

Details

chisq_test, cmh_test and lbl_test provide the Pearson chi-squared test, the generalized Cochran-Mantel-Haenszel test and the linear-by-linear association test. A general description of these methods is given by Agresti (2002).

The null hypothesis of independence, or conditional independence given block, between y and x is tested.

If y and/or x are ordered factors, the default scores, 1:nlevels(y) and 1:nlevels(x) respectively, can be altered using the scores argument (see independence_test); this argument can also be used to coerce nominal factors to class "ordered". (lbl_test coerces to class "ordered" under any circumstances.) If both y and x are ordered factors, a linear-by-linear association test is computed and the direction of the alternative hypothesis can be specified using the alternative argument. For the Pearson chi-squared test, this extension was given by Yates (1948) who also discussed the situation when either the response or the covariate is an ordered factor; see also Cochran (1954) and Armitage (1955) for the particular case when y is a binary factor and x is ordered. The Mantel-Haenszel statistic was similarly extended by Mantel (1963) and Landis, Heyman and Koch (1978).

The conditional null distribution of the test statistic is used to obtain p-values and an asymptotic approximation of the exact distribution is used by default (distribution = "asymptotic"). Alternatively, the distribution can be approximated via Monte Carlo resampling or computed exactly for univariate two-sample problems by setting distribution to "approximate" or "exact" respectively. See asymptotic, approximate and exact for details.

Value

An object inheriting from class "IndependenceTest".

Note

The exact versions of the Pearson chi-squared test and the generalized Cochran-Mantel-Haenszel test do not necessarily result in the same p-value as Fisher's exact test (Davis, 1986).

References

Agresti, A. (2002). Categorical Data Analysis, Second Edition. Hoboken, New Jersey: John Wiley & Sons.

Armitage, P. (1955). Tests for linear trends in proportions and frequencies. Biometrics 11(3), 375–386.

Cochran, W.G. (1954). Some methods for strengthening the common χ^2 tests. Biometrics 10(4), 417–451.

Davis, L. J. (1986). Exact tests for 2 x 2 contingency tables. The American Statistician 40(2), 139–141.

Landis, J. R., Heyman, E. R. and Koch, G. G. (1978). Average partial association in three-way contingency tables: a review and discussion of alternative tests. International Statistical Review 46(3), 237–254.

Mantel, N. (1963). Chi-square tests with one degree of freedom: extensions of the Mantel-Haenszel procedure. Journal of the American Statistical Association 58(303), 690–700.

Yates, F. (1948). The analysis of contingency tables with groupings based on quantitative characters. Biometrika 35(1/2), 176–181.

Examples

## Example data
## Davis (1986, p. 140)
davis <- matrix(
    c(3,  6,
      2, 19),
    nrow = 2, byrow = TRUE
)

## Asymptotic Pearson chi-squared test
chisq_test(as.table(davis))

## Approximative (Monte Carlo) Pearson chi-squared test
ct <- chisq_test(as.table(davis),
                 distribution = approximate(B = 10000))
pvalue(ct)          # standard p-value
midpvalue(ct)       # mid-p-value
pvalue_interval(ct) # p-value interval

## Exact Pearson chi-squared test (Davis, 1986)
## Note: disagrees with Fisher's exact test
ct <- chisq_test(as.table(davis),
                 distribution = "exact")
pvalue(ct)          # standard p-value
midpvalue(ct)       # mid-p-value
pvalue_interval(ct) # p-value interval
fisher.test(davis)


## Laryngeal cancer data
## Agresti (2002, p. 107, Tab. 3.13)
cancer <- matrix(
    c(21, 2,
      15, 3),
    nrow = 2, byrow = TRUE,
    dimnames = list(
        "Treatment" = c("Surgery", "Radiation"),
           "Cancer" = c("Controlled", "Not Controlled")
    )
)

## Exact Pearson chi-squared test (Agresti, 2002, p. 108, Tab. 3.14)
## Note: agrees with Fishers's exact test
(ct <- chisq_test(as.table(cancer),
                  distribution = "exact"))
midpvalue(ct)       # mid-p-value
pvalue_interval(ct) # p-value interval
fisher.test(cancer)


## Homework conditions and teacher's rating
## Yates (1948, Tab. 1)
yates <- matrix(
    c(141, 67, 114, 79, 39,
      131, 66, 143, 72, 35,
       36, 14,  38, 28, 16),
    byrow = TRUE, ncol = 5,
    dimnames = list(
           "Rating" = c("A", "B", "C"),
        "Condition" = c("A", "B", "C", "D", "E")
    )
)

## Asymptotic Pearson chi-squared test (Yates, 1948, p. 176)
chisq_test(as.table(yates))

## Asymptotic Pearson-Yates chi-squared test (Yates, 1948, pp. 180-181)
## Note: 'Rating' and 'Condition' as ordinal
(ct <- chisq_test(as.table(yates),
                  alternative = "less",
                  scores = list("Rating" = c(-1, 0, 1),
                                "Condition" = c(2, 1, 0, -1, -2))))
statistic(ct)^2 # chi^2 = 2.332

## Asymptotic Pearson-Yates chi-squared test (Yates, 1948, p. 181)
## Note: 'Rating' as ordinal
chisq_test(as.table(yates),
           scores = list("Rating" = c(-1, 0, 1))) # Q = 3.825


## Change in clinical condition and degree of infiltration
## Cochran (1954, Tab. 6)
cochran <- matrix(
    c(11,  7,
      27, 15,
      42, 16,
      53, 13,
      11,  1),
    byrow = TRUE, ncol = 2,
    dimnames = list(
              "Change" = c("Marked", "Moderate", "Slight",
                           "Stationary", "Worse"),
        "Infiltration" = c("0-7", "8-15")
    )
)

## Asymptotic Pearson chi-squared test (Cochran, 1954, p. 435)
chisq_test(as.table(cochran)) # X^2 = 6.88

## Asymptotic Cochran-Armitage test (Cochran, 1954, p. 436)
## Note: 'Change' as ordinal
(ct <- chisq_test(as.table(cochran),
                  scores = list("Change" = c(3, 2, 1, 0, -1))))
statistic(ct)^2 # X^2 = 6.66


## Change in size of ulcer crater for two treatment groups
## Armitage (1955, Tab. 2)
armitage <- matrix(
    c( 6, 4, 10, 12,
      11, 8,  8,  5),
    byrow = TRUE, ncol = 4,
    dimnames = list(
        "Treatment" = c("A", "B"),
           "Crater" = c("Larger", "< 2/3 healed",
                        "=> 2/3 healed", "Healed")
    )
)

## Approximative (Monte Carlo) Pearson chi-squared test (Armitage, 1955, p. 379)
chisq_test(as.table(armitage),
           distribution = approximate(B = 10000)) # chi^2 = 5.91

## Approximative (Monte Carlo) Cochran-Armitage test (Armitage, 1955, p. 379)
(ct <- chisq_test(as.table(armitage),
                  distribution = approximate(B = 10000),
                  scores = list("Crater" = c(-1.5, -0.5, 0.5, 1.5))))
statistic(ct)^2 # chi_0^2 = 5.26


## Relationship between job satisfaction and income stratified by gender
## Agresti (2002, p. 288, Tab. 7.8)

## Asymptotic generalized Cochran-Mantel-Haenszel test (Agresti, p. 297)
cmh_test(jobsatisfaction) # CMH = 10.2001

## Asymptotic generalized Cochran-Mantel-Haenszel test (Agresti, p. 297)
## Note: 'Job.Satisfaction' as ordinal
cmh_test(jobsatisfaction,
         scores = list("Job.Satisfaction" = c(1, 3, 4, 5))) # L^2 = 9.0342

## Asymptotic linear-by-linear association test (Agresti, p. 297)
## Note: 'Job.Satisfaction' and 'Income' as ordinal
(lt <- lbl_test(jobsatisfaction,
                scores = list("Job.Satisfaction" = c(1, 3, 4, 5),
                              "Income" = c(3, 10, 20, 35))))
statistic(lt)^2 # M^2 = 6.1563