A two dimensional contingency table in matrix form
alternative
Indicates the alternative hypothesis: must be either "less", "two.sided", or "greater"
npNumbers
Number: The number of nuisance parameters considered
beta
Number: Confidence level for constructing the interval of nuisance parameters considered. Only used if interval=TRUE
interval
Logical: Indicates if a confidence interval on the nuisance parameter should be computed
method
Indicates the method for finding tables as or more extreme than the observed table:
must be either "Z-pooled", "Z-unpooled", "Santner and Snell", "Boschloo", "CSM", "CSM modified", or "CSM approximate".
CSM tests cannot be calculated for multinomial models
model
The model being used: must be either "Binomial" or "Multinomial"
cond.row
Logical: Indicates if row margins are fixed in the binomial models. Only used if model="Binomial"
to.plot
Logical: Indicates if plot of p-value vs. nuisance parameter should be generated. Only used if model="Binomial"
ref.pvalue
Logical: Indicates if p-value should be refined by maximizing the p-value function after the nuisance parameter is selected. Only used if model="Binomial"
Details
Unconditional exact tests can be used for binomial or multinomial models. The binomial model assumes the row
or column margins (but not both) are known in advance, while the multinomial model assumes only the total sample size is known beforehand.
Conditional tests have both row and column margins fixed. The null hypothesis is that the rows and columns are independent.
Under the binomial model, the user will need to input which margin is fixed (default is rows).
vspace{3 mm}
Let X denote a generic 2x2 table with fixed sample sizes n_1 and n_2, X_0 denote the
observed table, and T(X) represent the test statistic function. The null hypothesis can be written as p_1=p_2 equiv p.
The p-value function with rows fixed is the product of two independent binomials:
The multinomial model is similar except the summand has a multinomial distribution with two nuisance parameters.
vspace{3 mm}
There are several possible test statistics to determine the 'as or more extreme' tables seen in the index of summation.
The method variable lets the user choose the test statistic being used. A brief description for each test statistic is given below (see References for more details):
vspace{3 mm}
Let hat{p_1}=x_{11}/n_1, hat{p_2}=x_{21}/n_2, and hat{p}=(x_{11}+x_{21})/(n_1+n_2).
vspace{3 mm}
Uses the p-value from Fisher's exact test as the test statistic.
vspace{3 mm}
CSM:
Starts with the most extreme table and adds other 'as or more extreme' tables one step at a time
by maximizing the summand of the p-value function. This approach can be computationally intensive.
vspace{0 mm}
CSM modified:
Starts with all tables that must be more extreme and adds other 'as or more extreme' tables one step at a time
by maximizing the summand of the p-value function. This approach can be computationally intensive.
vspace{3 mm}
CSM approximate:
Maximizes the summand of the p-value function for each possible table. Thus, the test statistic is the p-value function without the summation.
This approach is less computationally intensive than the CSM test because the maximization is not repeated at each step.
vspace{3 mm}
The supremum of the common success probability is taken over all values between 0 and 1. Another approach, proposed by Berger and Boos, is
to take the supremum over a Clopper-Pearson confidence interval. This approach adds a small penalty to the p-value to ensure a level-α test,
but eliminates unlikely probabilities from inflating the p-value. The p-value function becomes:
where C_β is the 100(1-β)% confidence interval of pvspace{3 mm}
There are many ways to define the two-sided p-value; this code uses the fisher.test() approach by summing the
probabilities for both sides of the table.
Value
p.value
The computed p-value
test.statistic
The observed test statistic
np
The nuisance parameter that maximizes the p-value. For multinomial models, both nuisance parameters
are given
np.range
The range of nuisance parameters considered. For multinomial models, both nuisance parameter
ranges are given
Warning
Multinomial models and CSM tests may take a very long time, even for sample sizes less than 100.
Note
See formulas in link: http://cran.r-project.org/web/packages/Exact/Exact.pdf. CSM test and multinomial models are much more computationally intensive. I have also spent a greater amount of time making the
computations for the binomial models more efficient; future work will be devoted to improving the multinomial models.
Boschloo's test also takes longer due to calculating Fisher's p-value for every possible table; however, a created function that
calculates Fisher's test efficiently is utilized. Increasing the number of nuisance parameters considered and refining the p-value
will increase the computation time.