R: Estimate degrees of differential expression (DE) for...
estimateDE
R Documentation
Estimate degrees of differential expression (DE) for individual genes
Description
This function calculates p-values (or the related statistics) for
identifying differentially expressed genes (DEGs) from a
TCC-class object.
estimateDE internally calls a specified method
implemented in other R packages.
Usage
estimateDE(tcc, test.method, FDR, paired,
full, reduced, # for DESeq, DESeq2
design, contrast, # for edgeR, DESeq2, voom
coef, # for edgeR, voom
group, cl, # for baySeq
samplesize, # for baySeq, SAMseq
logged, floor, # for WAD
...
)
Arguments
tcc
TCC-class object.
test.method
character string specifying a method for identifying
DEGs: one of "edger", "deseq", "deseq2",
"bayseq", "samseq", "voom", and "wad".
See the "Details" field for detail.
The default is "edger" when analyzing the count data with
replicates (i.e., min(table(tcc$group[, 1])) > 1), and
"deseq" (2 group) and "deseq2" (more than 2 group)
when analyzing the count data without replicates
(i.e., min(table(tcc$group[, 1])) == 1).
FDR
numeric value (between 0 and 1) specifying the threshold
for determining DEGs.
paired
logical. If TRUE, the input data are regarded as
(two-group) paired samples. If FALSE, the input data are
regarded as unpaired samples. The default is FALSE.
full
a formula for creating full model described in DESeq and
DESeq2. The right hand side can involve any column of tcc$group
is used as the model frame.
See the fitNbinomGLMs function in DESeq
for details, or nbinomLRT function in DESeq2.
reduced
a formula for creating reduced model described in DESeq.
The right hand side can involve any column of tcc$group is
used as the model frame.
See the fitNbinomGLMs function in DESeq
for details, or nbinomLRT function in DESeq2.
design
the argument is used in edgeR, voom (limma) and DESeq2.
For edgeR and voom, it should be the numeric matrix giving the
design matrix for the generalized linear model.
See the glmFit function in edgeR or
the lmFit function in limma for details.
For DESeq2, it should be a formula specifying the design of the
experiment. See the DESeqDataSet function
in DESeq2 for details.
contrast
the argument is used in edgeR and DESeq2.
For edgeR, numeric vector specifying a contrast of the linear model
coefficients to be tested equal to zero.
See the glmLRT function in edgeR for details.
For DESeq2, the argument is same to contrast which used in
DESeq2 package to retrive the results from Wald test. See the
results function in DESeq2 for details.
coef
integer or character vector indicating which coefficients
of the linear model are to be tested equal to zero.
See the glmLRT function in edgeR for details.
group
numeric or character string identifying the columns in
the tcc$group for analysis. See the group argument
of topCounts function in baySeq for details.
cl
snow object when using multi processors if
test.method = "bayseq" is specified.
See the getPriors.NB function in baySeq
for details.
samplesize
integer specifying (i) the sample size for estimating the
prior parameters if test.method = "bayseq" (defaults to 10000),
and (ii) the number of permutation in samr if
test.method = "samseq" (defaults to 100).
logged
logical. If TRUE, the input data are regarded as
log2-transformed. If FALSE, the log2-transformation is
performed after the floor setting. The default is
logged = FALSE.
Ignored if test.method is not "wad".
floor
numeric scalar (> 0) specifying the floor value for
taking logarithm. The default is floor = 1, indicating that
values less than 1 are replaced by 1. Ignored if
logged = TRUE.
Ignored if test.method is not "wad".
...
further paramenters.
Details
estimaetDE function is generally used after performing the
calcNormFactors function that calculates normalization factors.
estimateDE constructs a statistical model for differential expression
(DE) analysis with the calculated normalization factors and returns the
p-values (or the derivatives). The individual functions in other
packages are internally called according to the specified
test.method parameter.
test.method = "edger"
There are two approaches (i.e., exact test and GLM) to identify DEGs
in edgeR. The two approches are implmented in TCC. As a default,
the exact test approach is used for two-group data,
and GLM approach is used for multi-group or multi-factor data.
However, if design and the one of coef or
contrast are given, the GLM approach will be used for
two-group data.
If the exact test approach is used,
estimateCommonDisp,
estimateTagwiseDisp, and
exactTest are internally called.
If the GLM approach is used,
estimateGLMCommonDisp,
estimateGLMTrendedDisp, estimateGLMTagwiseDisp,
glmFit, and
glmLRT
are internally called.
test.method = "deseq"
DESeq supports two approach (i.e. an exact test and
GLM approach) for identifying DEGs. As a default,
the exact test is used for two-group data,
and GLM approach is used for multi-group or multi-factor data.
However, if full and reduced are given, the GLM approach
will be used for two-group data.
If the exact test is used,
estimateDispersions and
nbinomTest are internally called.
If the GLM approach is used,
estimateDispersions,
fitNbinomGLMs, and
nbinomGLMTest
are internally called.
test.method = "deseq2" estimateDispersions, and
nbinomWaldTest are internally called for
identifying DEGs.
However, if full and reduced are given,
the nbinomLRT will be used.
test.method = "bayseq" getPriors.NB and
getLikelihoods in baySeq are internally
called for identifying DEGs.
If paired = TRUE,
getPriors and
getLikelihoods in baySeq are used.
test.method = "samseq" SAMseq with
resp.type = "Two class unpaired" arugment
in samr package is called to identify DEGs for two-group data,
resp.type = "Two class paired" for paired two-group data,
and resp.type = "Multiclass" for multi-group data.
test.method = "voom" voom, lmFit, and
eBayes in limma are internally called
for identifying DEGs.
test.method = "wad"
The WAD implemented in TCC is used for identifying
DEGs. Since WAD outputs test statistics instead of
p-values, the tcc$stat$p.value and
tcc$stat$q.value are NA.
Alternatively, the test statistics are stored in
tcc$stat$testStat field.
Value
A TCC-class object containing following fields:
stat$p.value
numeric vector of p-values.
stat$q.value
numeric vector of q-values calculated
based on the p-values using the p.adjust function
with default parameter settings.
stat$testStat
numeric vector of test statistics if
"wad" is specified.
stat$rank
gene rank in order of the p-values or
test statistics.
estimatedDEG
numeric vector consisting of 0 or 1
depending on whether each gene is classified
as non-DEG or DEG. The threshold for classifying
DEGs or non-DEGs is preliminarily given as the
FDR argument.
Examples
# Analyzing a simulation data for comparing two groups
# (G1 vs. G2) with biological replicates
# The DE analysis is performed by an exact test in edgeR coupled
# with the DEGES/edgeR normalization factors.
# For retrieving the summaries of DE results, we recommend to use
# the getResult function.
data(hypoData)
group <- c(1, 1, 1, 2, 2, 2)
tcc <- new("TCC", hypoData, group)
tcc <- calcNormFactors(tcc, norm.method = "tmm", test.method = "edger",
iteration = 1, FDR = 0.1, floorPDEG = 0.05)
tcc <- estimateDE(tcc, test.method = "edger", FDR = 0.1)
head(tcc$stat$p.value)
head(tcc$stat$q.value)
head(tcc$estimatedDEG)
result <- getResult(tcc)
# Analyzing a simulation data for comparing two groups
# (G1 vs. G2) without replicates
# The DE analysis is performed by an negative binomial (NB) test
# in DESeq coupled with the DEGES/DESeq normalization factors.
data(hypoData)
group <- c(1, 2)
tcc <- new("TCC", hypoData[, c(1, 4)], group)
tcc <- calcNormFactors(tcc, norm.method = "deseq", test.method = "deseq",
iteration = 1, FDR = 0.1, floorPDEG = 0.05)
tcc <- estimateDE(tcc, test.method = "deseq", FDR = 0.1)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(TCC)
Loading required package: DESeq
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: locfit
locfit 1.5-9.1 2013-03-22
Loading required package: lattice
Welcome to 'DESeq'. For improved performance, usability and
functionality, please consider migrating to 'DESeq2'.
Loading required package: DESeq2
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: SummarizedExperiment
Attaching package: 'DESeq2'
The following objects are masked from 'package:DESeq':
estimateSizeFactorsForMatrix, getVarianceStabilizedData,
varianceStabilizingTransformation
Loading required package: edgeR
Loading required package: limma
Attaching package: 'limma'
The following object is masked from 'package:DESeq2':
plotMA
The following object is masked from 'package:DESeq':
plotMA
The following object is masked from 'package:BiocGenerics':
plotMA
Loading required package: baySeq
Loading required package: abind
Loading required package: perm
Loading required package: ROC
Attaching package: 'TCC'
The following object is masked from 'package:edgeR':
calcNormFactors
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/TCC/estimateDE.Rd_%03d_medium.png", width=480, height=480)
> ### Name: estimateDE
> ### Title: Estimate degrees of differential expression (DE) for individual
> ### genes
> ### Aliases: estimateDE
> ### Keywords: methods
>
> ### ** Examples
>
> # Analyzing a simulation data for comparing two groups
> # (G1 vs. G2) with biological replicates
> # The DE analysis is performed by an exact test in edgeR coupled
> # with the DEGES/edgeR normalization factors.
> # For retrieving the summaries of DE results, we recommend to use
> # the getResult function.
> data(hypoData)
> group <- c(1, 1, 1, 2, 2, 2)
> tcc <- new("TCC", hypoData, group)
> tcc <- calcNormFactors(tcc, norm.method = "tmm", test.method = "edger",
+ iteration = 1, FDR = 0.1, floorPDEG = 0.05)
TCC::INFO: Calculating normalization factors using DEGES
TCC::INFO: (iDEGES pipeline : tmm - [ edger - tmm ] X 1 )
TCC::INFO: Done.
> tcc <- estimateDE(tcc, test.method = "edger", FDR = 0.1)
TCC::INFO: Identifying DE genes using edger ...
TCC::INFO: Done.
> head(tcc$stat$p.value)
[1] 5.165962e-03 7.649553e-04 3.971968e-04 5.608797e-01 3.283889e-05
[6] 4.212516e-06
> head(tcc$stat$q.value)
[1] 4.004622e-02 7.649553e-03 4.317357e-03 1.000000e+00 4.901327e-04
[6] 8.962799e-05
> head(tcc$estimatedDEG)
[1] 1 1 1 0 1 1
> result <- getResult(tcc)
>
>
> # Analyzing a simulation data for comparing two groups
> # (G1 vs. G2) without replicates
> # The DE analysis is performed by an negative binomial (NB) test
> # in DESeq coupled with the DEGES/DESeq normalization factors.
> data(hypoData)
> group <- c(1, 2)
> tcc <- new("TCC", hypoData[, c(1, 4)], group)
> tcc <- calcNormFactors(tcc, norm.method = "deseq", test.method = "deseq",
+ iteration = 1, FDR = 0.1, floorPDEG = 0.05)
TCC::INFO: Calculating normalization factors using DEGES
TCC::INFO: (iDEGES pipeline : deseq - [ deseq - deseq ] X 1 )
TCC::INFO: Done.
> tcc <- estimateDE(tcc, test.method = "deseq", FDR = 0.1)
TCC::INFO: Identifying DE genes using deseq ...
TCC::INFO: Done.
>
>
>
>
>
> dev.off()
null device
1
>