R: Reproducibility-Optimized Test Statistic (ROTS)
ROTS
R Documentation
Reproducibility-Optimized Test Statistic (ROTS)
Description
Calculates the reproducibility-optimized test statistic
(ROTS) for ranking genes in order of evidence for differential
expression in two-group comparisons.
a numeric data matrix or an ExpressionSet instance, in which rows correspond to genes
and columns correspond to samples.
groups
a vector indicating the sample groups.
B
an integer specifying the number of bootstrap and permutation
resamplings.
K
an integer indicating the largest top list size considered.
paired
a logical indicating whether a paired test is performed.
seed
an integer seed for the random number generator.
a1, a2
a non-negative parameter. See details section for
further information.
log
a logical indicating whether input data is log2 scaled.
progress
a logical indicating if additional progress bars are shown.
Details
The reproducibility-optimization procedure ROTS enables the
selection of a suitable gene ranking statistic directly from the given
dataset. The statistic is optimized among a family of t-type
statistics d = m/(a1+a2*s), where m is the absolute difference between
the group averages, s is the pooled standard error, and a1 and a2 are
the non-negative parameters to be optimized. Two special cases of this
family are the ordinary t-statistic (a1=0, a2=1) and the signal
log-ratio (a1=1, a2=0). The optimality is defined in terms of maximal
overlap of top-ranked genes in group-preserving bootstrap datasets.
Importantly, besides the group labels, no a priori information about
the properties of the data is required and no fixed cutoff for the
gene rankings needs to be specified. For more details about the
reproducibility-optimization procedure, see Elo et al. (2008).
The user is given the option to adjust the largest top list size
considered in the reproducibility calculations, since lowering this
size can markedly reduce the computation time. In large data matrices
with thousands of rows, we generally recommend using a size of several
thousands. In smaller data matrices, and especially if there are many
rows with only a few non-missing entries, the size of K should be
decreased accordingly.
ROTS tolerates a moderate number of missing values in the data matrix
by effectively ignoring their contribution during the operation of the
procedure. However, each row of the data matrix must contain
at least two values in both groups. The rows containing only a few
non-missing values should be removed; or alternatively, the missing data
entries can be imputed using, e.g., the K-nearest neighbour
imputation, which is implemented in the Bioconductor package
impute.
If the parameter values a1 and a2 are set by the user, then no
optimization is performed but the statistic and FDR-values are
calculated for the given parameters. The false discovery rate (FDR)
for the optimized test statistic is calculated by permuting the sample
labels. The results for all the genes can be obtained by setting the
FDR cutoff to 1.
Value
ROTS returns an object of class ROTS, which is a list
containing the following components
data
the expression data matrix.
B
the number of bootstrap and permutation resamplings.
d
the value of the optimized ROTS-statistic for each gene.
pvalue
the corresponding pvalues.
FDR
the corresponding FDR-values.
a1
the optimized parameter a1.
a2
the optimized parameter a2.
k
the optimized top list size.
R
the optimized reproducibility value.
Z
the optimized reproducibility Z-score.
print prints the optimized parameters a1 and a2, the optimized
top list size and the corresponding reproducibility values.
summary summarizes the results of a ROTS analysis. If
fdr and num.genes are not specified, then the optimized
parameters a1 and a2, the optimized top list size and the
corresponding reproducibility values are shown. If fdr or
num.genes is specified, then also the gene-specific information
is shown for the genes at the specified FDR-level or top list size,
respectively.
L. L. Elo, S. Filen, R. Lahesmaa and T. Aittokallio:
Reproducibility-optimized test statistic for ranking genes in
microarray studies. IEEE/ACM Transactions on Computational Biology and
Bioinformatics 5: 423–431, 2008.
See Also
affySpikeIn
Examples
## ROTS-statistic for the Affymetrix spike-in data.
rots.out <- ROTS(data = affySpikeIn, groups = c(rep(0,5), rep(1,5)),
B = 100, K = 500 , seed = 1234)
## Summary of the ROTS results.
rots.summary <- summary(rots.out, fdr = 0.05)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(ROTS)
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/ROTS/ROTS.Rd_%03d_medium.png", width=480, height=480)
> ### Name: ROTS
> ### Title: Reproducibility-Optimized Test Statistic (ROTS)
> ### Aliases: ROTS ROTS-package print.ROTS
> ### Keywords: math
>
> ### ** Examples
>
> ## ROTS-statistic for the Affymetrix spike-in data.
> rots.out <- ROTS(data = affySpikeIn, groups = c(rep(0,5), rep(1,5)),
+ B = 100, K = 500 , seed = 1234)
Bootstrapping samples
Optimizing parameters
Calculating p-values
Calculating FDR
> ## Summary of the ROTS results.
> rots.summary <- summary(rots.out, fdr = 0.05)
ROTS results:
Number of resamplings: 100
a1: 1.6
a2: 1
Top list size: 10
Reproducibility value: 0.908
Z-score: 23.32576
5 rows satisfy the condition.
Row ROTS-statistic pvalue FDR
684_at 315 -4.3078654 0.00002 0
36202_at 555 0.4830924 0.00023 0
36085_at 710 0.4443196 0.00025 0
1024_at 833 0.3993879 0.00027 0
36311_at 303 0.3805451 0.00030 0
>
>
>
>
>
> dev.off()
null device
1
>