R Graphical Manual

Browse All

Last data update: 2014.03.03

R: SAM Analysis Using a Modified t-statistic

d.stat

R Documentation

SAM Analysis Using a Modified t-statistic

Description

Computes the required statistics for a Significance Analysis of Microarrays (SAM) using either a (modified) t- or F-statistic.

Should not be called directly, but via the function sam.

Usage

  d.stat(data, cl, var.equal = FALSE, B = 100, med = FALSE, s0 = NA, 
      s.alpha = seq(0, 1, 0.05), include.zero = TRUE, n.subset = 10, 
      mat.samp = NULL, B.more = 0.1, B.max = 30000, gene.names = NULL,
      R.fold = 1, use.dm = TRUE, R.unlog = TRUE, na.replace = TRUE, 
      na.method = "mean", rand = NA)

Arguments

`data`	a matrix, data frame or `ExpressionSet` object. Each row of `data` (or `exprs(data)`, respectively) must correspond to a variable (e.g., a gene), and each column to a sample (i.e. an observation).
`cl`	a numeric vector of length `ncol(data)` containing the class labels of the samples. In the two class paired case, `cl` can also be a matrix with `ncol(data)` rows and 2 columns. If `data` is an `ExpressionSet` object, `cl` can also be a character string. For details on how `cl` should be specified, see `?sam`.
`var.equal`	if `FALSE` (default), Welch's t-statistic will be computed. If `TRUE`, the pooled variance will be used in the computation of the t-statistic.
`B`	numeric value indicating how many permutations should be used in the estimation of the null distribution.
`med`	if `FALSE` (default), the mean number of falsely called genes will be computed. Otherwise, the median number is calculated.
`s0`	a numeric value specifying the fudge factor. If `NA` (default), `s0` will be computed automatically.
`s.alpha`	a numeric vector or value specifying the quantiles of the standard deviations of the genes used in the computation of `s0`. If `s.alpha` is a vector, the fudge factor is computed as proposed by Tusher et al. (2001). Otherwise, the quantile of the standard deviations specified by `s.alpha` is used as fudge factor.
`include.zero`	if `TRUE`, `s0` = 0 will also be a possible choice for the fudge factor. Hence, the usual t-statistic or F statistic, respectively, can also be a possible choice for the expression score d. If `FALSE`, `s0=0` will not be a possible choice for the fudge factor. The latter follows Tusher et al. (2001) definition of the fudge factor in which only strictly positive values are considered.
`n.subset`	a numeric value indicating how many permutations are considered simultaneously when computing the p-value and the number of falsely called genes. If `med = TRUE`, `n.subset` will be set to 1.
`mat.samp`	a matrix having `ncol(data)` columns except for the two class paired case in which `mat.samp` has `ncol(data)`/2 columns. Each row specifies one permutation of the group labels used in the computation of the expected expression scores d.bar. If not specified (`mat.samp=NULL`), a matrix having `B` rows and `ncol(data)` is generated automatically and used in the computation of d.bar. In the two class unpaired case and the multiclass case, each row of `mat.samp` must contain the same group labels as `cl`. In the one class and the two class paired case, each row must contain -1's and 1's. In the one class case, the expression values are multiplied by these -1's and 1's. In the two class paired case, each column corresponds to one observation pair whose difference is multiplied by either -1 or 1. For more details and examples, see the manual of siggenes.
`B.more`	a numeric value. If the number of all possible permutations is smaller than or equal to (1+`B.more`)*`B`, full permutation will be done. Otherwise, `B` permutations are used. This avoids that `B` permutations will be used – and not all permutations – if the number of all possible permutations is just a little larger than `B`.
`gene.names`	a character vector of length `nrow(data)` containing the names of the genes.
`B.max`	a numeric value. If the number of all possible permutations is smaller than or equal to `B.max`, `B` randomly selected permutations will be used in the computation of the null distribution. Otherwise, `B` random draws of the group labels are used. In the latter way of permuting it is possible that some of the permutations are used more than once.
`R.fold`	a numeric value. If the fold change of a gene is smaller than or equal to `R.fold`, or larger than or equal to 1/`R.fold`,respectively, then this gene will be excluded from the SAM analysis. The expression score d of excluded genes is set to `NA`. By default, `R.fold` is set to 1 such that all genes are included in the SAM analysis. Setting `R.fold` to 0 or a negative value will avoid the computation of the fold change. The fold change is only computed in the two-class unpaired cases.
`use.dm`	if `TRUE`, the fold change is computed by 2 to the power of the difference between the mean log2 intensities of the two groups, i.e. 2 to the power of the numerator of the test statistic. If `FALSE`, the fold change is determined by computing 2 to the power of `data` (if `R.unlog = TRUE`) and then calculating the ratio of the mean intensity in the group coded by 1 to the mean intensity in the group coded by 0. The latter is the definition of the fold change used in Tusher et al. (2001).
`R.unlog`	if `TRUE`, the anti-log of `data` will be used in the computation of the fold change. Otherwise, `data` is used. This transformation should be done when `data` is log2-tranformed (in a SAM analysis it is highly recommended to use log2-transformed expression data). Ignored if `use.dm = TRUE`.
`na.replace`	if `TRUE`, missing values will be removed by the genewise/rowwise statistic specified by `na.method`. If a gene has less than 2 non-missing values, this gene will be excluded from further analysis. If `na.replace=FALSE`, all genes with one or more missing values will be excluded from further analysis. The expression score d of excluded genes is set to `NA`.
`na.method`	a character string naming the statistic with which missing values will be replaced if `na.replace=TRUE`. Must be either `"mean"` (default) or `median`.
`rand`	numeric value. If specified, i.e. not `NA`, the random number generator will be set into a reproducible state.

Value

An object of class SAM.

Author(s)

Holger Schwender, holger.schw@gmx.de

References

Schwender, H., Krause, A. and Ickstadt, K. (2003). Comparison of the Empirical Bayes and the Significance Analysis of Microarrays. Technical Report, SFB 475, University of Dortmund, Germany.

Tusher, V.G., Tibshirani, R., and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. PNAS, 98, 5116-5121.