A data frame or matrix, with m rows corresponding to variables
(hypotheses) and
n columns to observations. In the case of gene expression data, rows
correspond to genes and columns to mRNA samples. The data can
be read using read.table.
classlabel
A vector of integers corresponding to observation (column)
class labels. For k classes, the labels must be integers
between 0 and k-1. For the blockf test option,
observations may be divided into
n/k blocks of k observations each. The observations are
ordered by block, and within each block, they are labeled using the
integers 0 to k-1.
test
A character string specifying the statistic to be
used to test the null hypothesis of no association between the
variables and the class labels.
If test="t", the tests are based on two-sample Welch t-statistics
(unequal variances).
If test="t.equalvar", the tests are based on two-sample
t-statistics with equal variance for the two samples. The
square of the t-statistic is equal to an F-statistic for k=2.
If test="wilcoxon", the tests are based on standardized rank sum Wilcoxon statistics.
If test="f", the tests are based on F-statistics.
If test="pairt", the tests are based on paired t-statistics. The
square of the paired t-statistic is equal to a block F-statistic for k=2.
If test="blockf", the tests are based on F-statistics which
adjust for block differences
(cf. two-way analysis of variance).
side
A character string specifying the type of rejection region.
If side="abs", two-tailed tests, the null hypothesis is rejected for large absolute values of the test statistic.
If side="upper", one-tailed tests, the null hypothesis is rejected for large values of the test statistic.
If side="lower", one-tailed tests, the null hypothesis is rejected for small values of the test statistic.
fixed.seed.sampling
If fixed.seed.sampling="y", a
fixed seed sampling procedure is used, which may double the
computing time, but will not use extra memory to store the
permutations. If fixed.seed.sampling="n", permutations will
be stored in memory. For the blockf test, the option n was not implemented as it requires too much memory.
B
The number of permutations. For a complete
enumeration, B should be 0 (zero) or any number not less than
the total number of permutations.
na
Code for missing values (the default is .mt.naNUM=--93074815.62).
Entries with missing values will be ignored in the computation,
i.e., test statistics will be based on a smaller sample size. This
feature has not yet fully implemented.
nonpara
If nonpara="y", nonparametric test statistics are computed based on ranked data.
If nonpara="n", the original data are used.
Details
These functions compute permutation adjusted p-values for the step-down maxT and minP multiple testing procedures, which provide strong control of the family-wise Type I error rate (FWER). The adjusted p-values for the minP procedure are defined in equation (2.10) p. 66 of Westfall & Young (1993), and the maxT procedure is discussed p. 50 and 114. The permutation algorithms for estimating the adjusted p-values are given in Ge et al. (In preparation). The procedures are for the simultaneous test of m null hypotheses, namely, the null hypotheses of no association between the m variables corresponding to the rows of the data frame X and the class labels classlabel. For gene expression data, the null hypotheses correspond to no differential gene expression across mRNA samples.
Value
A data frame with components
index
Vector of row indices, between 1 and nrow(X), where rows are sorted first according to
their adjusted p-values, next their unadjusted p-values, and finally their test statistics.
teststat
Vector of test statistics, ordered according to index. To get the test statistics in the original data order, use teststat[order(index)].
rawp
Vector of raw (unadjusted) p-values, ordered according to index.
adjp
Vector of adjusted p-values, ordered according to index.
plower
For mt.minP function only, vector of "adjusted p-values", where ties in the permutation distribution of the successive minima of raw p-values with the observed p-values are counted only once. Note that procedures based on plower do not control the FWER. Comparison of plower and adjp gives an idea of the discreteness of the permutation distribution. Values in plower are ordered according to index.
S. Dudoit, J. P. Shaffer, and J. C. Boldrick (Submitted). Multiple hypothesis testing in microarray experiments.
Y. Ge, S. Dudoit, and T. P. Speed. Resampling-based multiple testing for microarray data hypothesis, Technical Report #633 of UCB Stat. http://www.stat.berkeley.edu/~gyc
P. H. Westfall and S. S. Young (1993). Resampling-based
multiple testing: Examples and methods for p-value adjustment. John Wiley & Sons.
# Gene expression data from Golub et al. (1999)
# To reduce computation time and for illustrative purposes, we condider only
# the first 100 genes and use the default of B=10,000 permutations.
# In general, one would need a much larger number of permutations
# for microarray data.
data(golub)
smallgd<-golub[1:100,]
classlabel<-golub.cl
# Permutation unadjusted p-values and adjusted p-values
# for maxT and minP procedures with Welch t-statistics
resT<-mt.maxT(smallgd,classlabel)
resP<-mt.minP(smallgd,classlabel)
rawp<-resT$rawp[order(resT$index)]
teststat<-resT$teststat[order(resT$index)]
# Plot results and compare to Bonferroni procedure
bonf<-mt.rawp2adjp(rawp, proc=c("Bonferroni"))
allp<-cbind(rawp, bonf$adjp[order(bonf$index),2], resT$adjp[order(resT$index)],resP$adjp[order(resP$index)])
mt.plot(allp, teststat, plottype="rvsa", proc=c("rawp","Bonferroni","maxT","minP"),leg=c(0.7,50),lty=1,col=1:4,lwd=2)
mt.plot(allp, teststat, plottype="pvsr", proc=c("rawp","Bonferroni","maxT","minP"),leg=c(60,0.2),lty=1,col=1:4,lwd=2)
mt.plot(allp, teststat, plottype="pvst", proc=c("rawp","Bonferroni","maxT","minP"),leg=c(-6,0.6),pch=16,col=1:4)
# Permutation adjusted p-values for minP procedure with F-statistics (like equal variance t-statistics)
mt.minP(smallgd,classlabel,test="f",fixed.seed.sampling="n")
# Note that the test statistics used in the examples below are not appropriate
# for the Golub et al. data. The sole purpose of these examples is to
# demonstrate the use of the mt.maxT and mt.minP functions.
# Permutation adjusted p-values for maxT procedure with paired t-statistics
classlabel<-rep(c(0,1),19)
mt.maxT(smallgd,classlabel,test="pairt")
# Permutation adjusted p-values for maxT procedure with block F-statistics
classlabel<-rep(0:18,2)
mt.maxT(smallgd,classlabel,test="blockf",side="upper")