Performs an Empirical Bayes Analysis of Microarrays (EBAM). It is possible to perform
one and two class analyses using either a modified t-statistic or a (standardized)
Wilcoxon rank statistic, and a multiclass analysis using a modified F-statistic.
Moreover, this function provides a EBAM procedure for categorical data such as SNP data
and the possibility to employ an user-written score function.
either a matrix, a data frame or an ExpressionSet object, or the output of find.a0,
i.e. an object of class FindA0. Can also be a list (if method = chisq.ebam or
method = trend.ebam). For the latter case, see chisq.ebam.
If x is not a FindA0 object, then each row of x (or
exprs(x), respectively) must correspond to a variable (e.g., a gene or a SNP), and each column to
a sample.
cl
a specification of the class labels of the samples. Ignored if x is a FindA0 object.
Needs not to be specified if x is a list.
Typically, cl is specified by a vector of length ncol(x).
In the two class paired case, cl can also
be a matrix with ncol(x) rows and 2 columns. If x is
an ExpressionSet object, cl can also be a character string naming the column
of pData(x) that contains the class labels of the samples.
In the one-class case, cl should be a vector of 1's.
In the two class unpaired case, cl should be a vector containing 0's
(specifying the samples of, e.g., the control group) and 1's (specifying,
e.g., the case group).
In the two class paired case, cl can be either a numeric vector or a numeric matrix.
If it is a vector, then cl has to consist of the integers between -1 and
-n/2 (e.g., before treatment group) and between 1 and n/2 (e.g.,
after treatment group), where n is the length of cl and k
is paired with -k, k=1,…,n/2. If cl is a matrix, one
column should contain -1's and 1's specifying, e.g., the before and the after
treatment samples, respectively, and the other column should contain integer
between 1 and n/2 specifying the n/2 pairs of observations.
In the multiclass case and if method = chisq.ebam or method = trend.ebam,
cl should be a vector containing integers between 1 and g, where g is
the number of groups. In the two latter cases, cl needs not to be specified, if
x is a list. For details, see chisq.ebam.
For examples of how cl can be specified, see the manual of siggenes.
method
a character string or name specifying the method or function that should be
used in the computation of the expression score z.
If method = z.ebam,
a modified t- or F-statistic, respectively, will be computed as proposed by Efron et al. (2001).
If method = wilc.ebam, a (standardized) Wilcoxon sum / signed rank statistic will
be used as expression score.
For an analysis of categorical data such as SNP data,
method can be set to chisq.ebam. In this case, Pearson's Chi-squared statistic
is computed for each row.
If the variables are ordinal and a trend test should be applied
(e.g., in the two-class case, the Cochran-Armitage trend test), method = trend.ebam
can be employed.
It is also possible to employ an user-written function for computing an user-specified
expression score. For details, see the vignette of siggenes.
delta
a numeric vector consisting of probabilities for which the number of differentially
expressed genes and the FDR should be computed, where a gene is called differentially expressed
if its posterior probability is larger than Delta.
which.a0
an integer between 1 and the length of quan.a0 of find.a0. If
NULL, the suggested choice of find.a0 is used. Ignored if x is a matrix, data
frame or ExpressionSet object.
control
further arguments for controlling the EBAM analysis. For these arguments,
see ebamControl.
gene.names
a vector of length nrow(x) specifying the names of the variables. By default,
the row names of the matrix / data frame comprised by x are used.
...
further arguments of the specific EBAM methods. If method = z.ebam, see
z.ebam. If method = wilc.ebam, see wilc.ebam. If
method = chisq.ebam, see chisq.ebam.
Efron, B., Tibshirani, R., Storey, J.D. and Tusher, V. (2001). Empirical Bayes Analysis
of a Microarray Experiment. JASA, 96, 1151-1160.
Schwender, H., Krause, A., and Ickstadt, K. (2006). Identifying Interesting Genes with siggenes.
RNews, 6(5), 45-50.
Storey, J.D. and Tibshirani, R. (2003). Statistical Significance for Genome-Wide
Studies. Proceedings of the National Academy of Sciences, 100, 9440-9445.
## Not run:
# Load the data of Golub et al. (1999) contained in the package multtest.
data(golub)
# golub.cl contains the class labels.
golub.cl
# Perform an EBAM analysis for the two class unpaired case assuming
# unequal variances. Specify the fudge factor a0 by the suggested
# choice of find.a0
find.out <- find.a0(golub, golub.cl, rand = 123)
ebam.out <- ebam(find.out)
ebam.out
# Since a0 = 0 leads to the largest number of genes (i.e. the suggested
# choice of a0), the following leads to the same results as the above
# analysis (but only if the random number generator, i.e. rand, is set
# to the same number).
ebam.out2 <- ebam(golub, golub.cl, a0 = 0, fast = TRUE, rand = 123)
ebam.out2
# If fast is set to TRUE in ebam, a crude estimate of the number of
# falsely called genes is used (see the help file for z.ebam). This
# estimate is always employed in find.a0.
# The exact number is used in ebam when performing
ebam.out3 <- ebam(golub, golub.cl, a0 = 0, rand = 123)
ebam.out3
# Since this is the recommended way, we use ebam.out3 at the end of
# the Examples section for further analyses.
# Perform an EBAM analysis for the two class unpaired case assuming
# equal group variances. Set a0 = 0, and use B = 50 permutations
# of the class labels.
ebam.out4 <- ebam(golub, golub.cl, a0 = 0, var.equal = TRUE, B = 50,
rand = 123)
ebam.out4
# Perform an EBAM analysis for the two class unpaired cased assuming
# unequal group variances. Use the median (i.e. the 50% quantile)
# of the standard deviations of the genes as fudge factor a0. And
# obtain the number of genes and the FDR if a gene is called
# differentially when its posterior probability is larger than
# 0.95.
ebam.out5 <- ebam(golub, golub.cl, quan.a0 = 0.5, delta = 0.95,
rand = 123)
ebam.out5
# For the third analysis, obtain the number of differentially
# expressed genes and the FDR if a gene is called differentially
# expressed if its posterior probability is larger than 0.8, 0.85,
# 0.9, 0.95.
print(ebam.out3, c(0.8, 0.85, 0.9, 0.95))
# Generate a plot of the posterior probabilities for delta = 0.9.
plot(ebam.out3, 0.9)
# Obtain the list of genes called differentially expressed if their
# posterior probability is larger than 0.99, and gene-specific
# statistics for these variables such as their z-value and their
# local FDR.
summary(ebam.out3, 0.99)
## End(Not run)