R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Empirical Bayes Analysis of Microarrays

ebam	R Documentation

Empirical Bayes Analysis of Microarrays

Description

Performs an Empirical Bayes Analysis of Microarrays (EBAM). It is possible to perform one and two class analyses using either a modified t-statistic or a (standardized) Wilcoxon rank statistic, and a multiclass analysis using a modified F-statistic. Moreover, this function provides a EBAM procedure for categorical data such as SNP data and the possibility to employ an user-written score function.

Usage

  ebam(x, cl, method = z.ebam, delta = 0.9, which.a0 = NULL, 
       control = ebamControl(), gene.names = dimnames(x)[[1]],
       ...)

Arguments

`x`	either a matrix, a data frame or an ExpressionSet object, or the output of `find.a0`, i.e. an object of class `FindA0`. Can also be a list (if `method = chisq.ebam` or `method = trend.ebam`). For the latter case, see `chisq.ebam`. If `x` is not a FindA0 object, then each row of `x` (or `exprs(x)`, respectively) must correspond to a variable (e.g., a gene or a SNP), and each column to a sample.
`cl`	a specification of the class labels of the samples. Ignored if `x` is a `FindA0` object. Needs not to be specified if `x` is a list. Typically, `cl` is specified by a vector of length `ncol(x)`. In the two class paired case, `cl` can also be a matrix with `ncol(x)` rows and 2 columns. If `x` is an ExpressionSet object, `cl` can also be a character string naming the column of `pData(x)` that contains the class labels of the samples. In the one-class case, `cl` should be a vector of 1's. In the two class unpaired case, `cl` should be a vector containing 0's (specifying the samples of, e.g., the control group) and 1's (specifying, e.g., the case group). In the two class paired case, `cl` can be either a numeric vector or a numeric matrix. If it is a vector, then `cl` has to consist of the integers between -1 and -n/2 (e.g., before treatment group) and between 1 and n/2 (e.g., after treatment group), where n is the length of `cl` and k is paired with -k, k=1,…,n/2. If `cl` is a matrix, one column should contain -1's and 1's specifying, e.g., the before and the after treatment samples, respectively, and the other column should contain integer between 1 and n/2 specifying the n/2 pairs of observations. In the multiclass case and if `method = chisq.ebam` or `method = trend.ebam`, `cl` should be a vector containing integers between 1 and g, where g is the number of groups. In the two latter cases, `cl` needs not to be specified, if `x` is a list. For details, see `chisq.ebam`. For examples of how `cl` can be specified, see the manual of siggenes.
`method`	a character string or name specifying the method or function that should be used in the computation of the expression score z. If `method = z.ebam`, a modified t- or F-statistic, respectively, will be computed as proposed by Efron et al. (2001). If `method = wilc.ebam`, a (standardized) Wilcoxon sum / signed rank statistic will be used as expression score. For an analysis of categorical data such as SNP data, `method` can be set to `chisq.ebam`. In this case, Pearson's Chi-squared statistic is computed for each row. If the variables are ordinal and a trend test should be applied (e.g., in the two-class case, the Cochran-Armitage trend test), `method = trend.ebam` can be employed. It is also possible to employ an user-written function for computing an user-specified expression score. For details, see the vignette of siggenes.
`delta`	a numeric vector consisting of probabilities for which the number of differentially expressed genes and the FDR should be computed, where a gene is called differentially expressed if its posterior probability is larger than Delta.
`which.a0`	an integer between 1 and the length of `quan.a0` of `find.a0`. If `NULL`, the suggested choice of `find.a0` is used. Ignored if `x` is a matrix, data frame or ExpressionSet object.
`control`	further arguments for controlling the EBAM analysis. For these arguments, see `ebamControl`.
`gene.names`	a vector of length `nrow(x)` specifying the names of the variables. By default, the row names of the matrix / data frame comprised by `x` are used.
`...`	further arguments of the specific EBAM methods. If `method = z.ebam`, see `z.ebam`. If `method = wilc.ebam`, see `wilc.ebam`. If `method = chisq.ebam`, see `chisq.ebam`.

Value

An object of class EBAM.

Author(s)

Holger Schwender, holger.schw@gmx.de

References

Efron, B., Tibshirani, R., Storey, J.D. and Tusher, V. (2001). Empirical Bayes Analysis of a Microarray Experiment. JASA, 96, 1151-1160.

Schwender, H., Krause, A., and Ickstadt, K. (2006). Identifying Interesting Genes with siggenes. RNews, 6(5), 45-50.

Storey, J.D. and Tibshirani, R. (2003). Statistical Significance for Genome-Wide Studies. Proceedings of the National Academy of Sciences, 100, 9440-9445.

Examples

## Not run: 
  # Load the data of Golub et al. (1999) contained in the package multtest.
  data(golub)
  
  # golub.cl contains the class labels.
  golub.cl
  
  # Perform an EBAM analysis for the two class unpaired case assuming
  # unequal variances. Specify the fudge factor a0 by the suggested
  # choice of find.a0
  find.out <- find.a0(golub, golub.cl, rand = 123)
  ebam.out <- ebam(find.out)
  ebam.out
    
  # Since a0 = 0 leads to the largest number of genes (i.e. the suggested
  # choice of a0), the following leads to the same results as the above
  # analysis (but only if the random number generator, i.e. rand, is set
  # to the same number).
  ebam.out2 <- ebam(golub, golub.cl, a0 = 0, fast = TRUE, rand = 123)
  ebam.out2

  # If fast is set to TRUE in ebam, a crude estimate of the number of
  # falsely called genes is used (see the help file for z.ebam). This
  # estimate is always employed in find.a0. 
  # The exact number is used in ebam when performing
  ebam.out3 <- ebam(golub, golub.cl, a0 = 0, rand = 123)
  ebam.out3  

  # Since this is the recommended way, we use ebam.out3 at the end of
  # the Examples section for further analyses.



  # Perform an EBAM analysis for the two class unpaired case assuming
  # equal group variances. Set a0 = 0, and use B = 50 permutations
  # of the class labels.
  ebam.out4 <- ebam(golub, golub.cl, a0 = 0, var.equal = TRUE, B = 50,
     rand = 123)
  ebam.out4
    
  # Perform an EBAM analysis for the two class unpaired cased assuming
  # unequal group variances. Use the median (i.e. the 50% quantile)
  # of the standard deviations of the genes as fudge factor a0. And
  # obtain the number of genes and the FDR if a gene is called 
  # differentially when its posterior probability is larger than
  # 0.95.
  ebam.out5 <- ebam(golub, golub.cl, quan.a0 = 0.5, delta = 0.95,
     rand = 123)
  ebam.out5
    
  # For the third analysis, obtain the number of differentially
  # expressed genes and the FDR if a gene is called differentially
  # expressed if its posterior probability is larger than 0.8, 0.85,
  # 0.9, 0.95.
  print(ebam.out3, c(0.8, 0.85, 0.9, 0.95))
    
  # Generate a plot of the posterior probabilities for delta = 0.9.
  plot(ebam.out3, 0.9)
    
  # Obtain the list of genes called differentially expressed if their
  # posterior probability is larger than 0.99, and gene-specific 
  # statistics for these variables such as their z-value and their
  # local FDR.
  summary(ebam.out3, 0.99)

## End(Not run)