Last data update: 2014.03.03

R: Estimate degrees of differential expression (DE) for...
estimateDER Documentation

Estimate degrees of differential expression (DE) for individual genes

Description

This function calculates p-values (or the related statistics) for identifying differentially expressed genes (DEGs) from a TCC-class object. estimateDE internally calls a specified method implemented in other R packages.

Usage

estimateDE(tcc, test.method, FDR, paired,
           full, reduced,                   # for DESeq, DESeq2
           design, contrast,                # for edgeR, DESeq2, voom
           coef,                            # for edgeR, voom
           group, cl,                       # for baySeq
           samplesize,                      # for baySeq, SAMseq
           logged, floor,                   # for WAD
           ...
)

Arguments

tcc

TCC-class object.

test.method

character string specifying a method for identifying DEGs: one of "edger", "deseq", "deseq2", "bayseq", "samseq", "voom", and "wad". See the "Details" field for detail. The default is "edger" when analyzing the count data with replicates (i.e., min(table(tcc$group[, 1])) > 1), and "deseq" (2 group) and "deseq2" (more than 2 group) when analyzing the count data without replicates (i.e., min(table(tcc$group[, 1])) == 1).

FDR

numeric value (between 0 and 1) specifying the threshold for determining DEGs.

paired

logical. If TRUE, the input data are regarded as (two-group) paired samples. If FALSE, the input data are regarded as unpaired samples. The default is FALSE.

full

a formula for creating full model described in DESeq and DESeq2. The right hand side can involve any column of tcc$group is used as the model frame. See the fitNbinomGLMs function in DESeq for details, or nbinomLRT function in DESeq2.

reduced

a formula for creating reduced model described in DESeq. The right hand side can involve any column of tcc$group is used as the model frame. See the fitNbinomGLMs function in DESeq for details, or nbinomLRT function in DESeq2.

design

the argument is used in edgeR, voom (limma) and DESeq2. For edgeR and voom, it should be the numeric matrix giving the design matrix for the generalized linear model. See the glmFit function in edgeR or the lmFit function in limma for details. For DESeq2, it should be a formula specifying the design of the experiment. See the DESeqDataSet function in DESeq2 for details.

contrast

the argument is used in edgeR and DESeq2. For edgeR, numeric vector specifying a contrast of the linear model coefficients to be tested equal to zero. See the glmLRT function in edgeR for details. For DESeq2, the argument is same to contrast which used in DESeq2 package to retrive the results from Wald test. See the results function in DESeq2 for details.

coef

integer or character vector indicating which coefficients of the linear model are to be tested equal to zero. See the glmLRT function in edgeR for details.

group

numeric or character string identifying the columns in the tcc$group for analysis. See the group argument of topCounts function in baySeq for details.

cl

snow object when using multi processors if test.method = "bayseq" is specified. See the getPriors.NB function in baySeq for details.

samplesize

integer specifying (i) the sample size for estimating the prior parameters if test.method = "bayseq" (defaults to 10000), and (ii) the number of permutation in samr if test.method = "samseq" (defaults to 100).

logged

logical. If TRUE, the input data are regarded as log2-transformed. If FALSE, the log2-transformation is performed after the floor setting. The default is logged = FALSE. Ignored if test.method is not "wad".

floor

numeric scalar (> 0) specifying the floor value for taking logarithm. The default is floor = 1, indicating that values less than 1 are replaced by 1. Ignored if logged = TRUE. Ignored if test.method is not "wad".

...

further paramenters.

Details

estimaetDE function is generally used after performing the calcNormFactors function that calculates normalization factors. estimateDE constructs a statistical model for differential expression (DE) analysis with the calculated normalization factors and returns the p-values (or the derivatives). The individual functions in other packages are internally called according to the specified test.method parameter.

  • test.method = "edger"
    There are two approaches (i.e., exact test and GLM) to identify DEGs in edgeR. The two approches are implmented in TCC. As a default, the exact test approach is used for two-group data, and GLM approach is used for multi-group or multi-factor data. However, if design and the one of coef or contrast are given, the GLM approach will be used for two-group data.
    If the exact test approach is used, estimateCommonDisp, estimateTagwiseDisp, and exactTest are internally called.
    If the GLM approach is used, estimateGLMCommonDisp, estimateGLMTrendedDisp,
    estimateGLMTagwiseDisp, glmFit, and glmLRT are internally called.

  • test.method = "deseq"
    DESeq supports two approach (i.e. an exact test and GLM approach) for identifying DEGs. As a default, the exact test is used for two-group data, and GLM approach is used for multi-group or multi-factor data. However, if full and reduced are given, the GLM approach will be used for two-group data.
    If the exact test is used, estimateDispersions and nbinomTest are internally called.
    If the GLM approach is used, estimateDispersions, fitNbinomGLMs, and nbinomGLMTest are internally called.

  • test.method = "deseq2"
    estimateDispersions, and nbinomWaldTest are internally called for identifying DEGs. However, if full and reduced are given, the nbinomLRT will be used.

  • test.method = "bayseq"
    getPriors.NB and getLikelihoods in baySeq are internally called for identifying DEGs. If paired = TRUE, getPriors and getLikelihoods in baySeq are used.

  • test.method = "samseq"
    SAMseq with resp.type = "Two class unpaired" arugment in samr package is called to identify DEGs for two-group data, resp.type = "Two class paired" for paired two-group data, and resp.type = "Multiclass" for multi-group data.

  • test.method = "voom"
    voom, lmFit, and eBayes in limma are internally called for identifying DEGs.

  • test.method = "wad"
    The WAD implemented in TCC is used for identifying DEGs. Since WAD outputs test statistics instead of p-values, the tcc$stat$p.value and tcc$stat$q.value are NA. Alternatively, the test statistics are stored in tcc$stat$testStat field.

Value

A TCC-class object containing following fields:

stat$p.value

numeric vector of p-values.

stat$q.value

numeric vector of q-values calculated based on the p-values using the p.adjust function with default parameter settings.

stat$testStat

numeric vector of test statistics if "wad" is specified.

stat$rank

gene rank in order of the p-values or test statistics.

estimatedDEG

numeric vector consisting of 0 or 1 depending on whether each gene is classified as non-DEG or DEG. The threshold for classifying DEGs or non-DEGs is preliminarily given as the FDR argument.

Examples

# Analyzing a simulation data for comparing two groups
# (G1 vs. G2) with biological replicates
# The DE analysis is performed by an exact test in edgeR coupled
# with the DEGES/edgeR normalization factors.
# For retrieving the summaries of DE results, we recommend to use
# the getResult function.
data(hypoData)
group <- c(1, 1, 1, 2, 2, 2)
tcc <- new("TCC", hypoData, group)
tcc <- calcNormFactors(tcc, norm.method = "tmm", test.method = "edger",
                       iteration = 1, FDR = 0.1, floorPDEG = 0.05)
tcc <- estimateDE(tcc, test.method = "edger", FDR = 0.1)
head(tcc$stat$p.value)
head(tcc$stat$q.value)
head(tcc$estimatedDEG)
result <- getResult(tcc)


# Analyzing a simulation data for comparing two groups
# (G1 vs. G2) without replicates
# The DE analysis is performed by an negative binomial (NB) test
# in DESeq coupled with the DEGES/DESeq normalization factors.
data(hypoData)
group <- c(1, 2)
tcc <- new("TCC", hypoData[, c(1, 4)], group)
tcc <- calcNormFactors(tcc, norm.method = "deseq", test.method = "deseq",
                       iteration = 1, FDR = 0.1, floorPDEG = 0.05)
tcc <- estimateDE(tcc, test.method = "deseq", FDR = 0.1)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(TCC)
Loading required package: DESeq
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: locfit
locfit 1.5-9.1 	 2013-03-22
Loading required package: lattice
    Welcome to 'DESeq'. For improved performance, usability and
    functionality, please consider migrating to 'DESeq2'.
Loading required package: DESeq2
Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: SummarizedExperiment

Attaching package: 'DESeq2'

The following objects are masked from 'package:DESeq':

    estimateSizeFactorsForMatrix, getVarianceStabilizedData,
    varianceStabilizingTransformation

Loading required package: edgeR
Loading required package: limma

Attaching package: 'limma'

The following object is masked from 'package:DESeq2':

    plotMA

The following object is masked from 'package:DESeq':

    plotMA

The following object is masked from 'package:BiocGenerics':

    plotMA

Loading required package: baySeq
Loading required package: abind
Loading required package: perm
Loading required package: ROC

Attaching package: 'TCC'

The following object is masked from 'package:edgeR':

    calcNormFactors

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/TCC/estimateDE.Rd_%03d_medium.png", width=480, height=480)
> ### Name: estimateDE
> ### Title: Estimate degrees of differential expression (DE) for individual
> ###   genes
> ### Aliases: estimateDE
> ### Keywords: methods
> 
> ### ** Examples
> 
> # Analyzing a simulation data for comparing two groups
> # (G1 vs. G2) with biological replicates
> # The DE analysis is performed by an exact test in edgeR coupled
> # with the DEGES/edgeR normalization factors.
> # For retrieving the summaries of DE results, we recommend to use
> # the getResult function.
> data(hypoData)
> group <- c(1, 1, 1, 2, 2, 2)
> tcc <- new("TCC", hypoData, group)
> tcc <- calcNormFactors(tcc, norm.method = "tmm", test.method = "edger",
+                        iteration = 1, FDR = 0.1, floorPDEG = 0.05)
TCC::INFO: Calculating normalization factors using DEGES
TCC::INFO: (iDEGES pipeline : tmm - [ edger - tmm ] X 1 )
TCC::INFO: Done.
> tcc <- estimateDE(tcc, test.method = "edger", FDR = 0.1)
TCC::INFO: Identifying DE genes using edger ...
TCC::INFO: Done.
> head(tcc$stat$p.value)
[1] 5.165962e-03 7.649553e-04 3.971968e-04 5.608797e-01 3.283889e-05
[6] 4.212516e-06
> head(tcc$stat$q.value)
[1] 4.004622e-02 7.649553e-03 4.317357e-03 1.000000e+00 4.901327e-04
[6] 8.962799e-05
> head(tcc$estimatedDEG)
[1] 1 1 1 0 1 1
> result <- getResult(tcc)
> 
> 
> # Analyzing a simulation data for comparing two groups
> # (G1 vs. G2) without replicates
> # The DE analysis is performed by an negative binomial (NB) test
> # in DESeq coupled with the DEGES/DESeq normalization factors.
> data(hypoData)
> group <- c(1, 2)
> tcc <- new("TCC", hypoData[, c(1, 4)], group)
> tcc <- calcNormFactors(tcc, norm.method = "deseq", test.method = "deseq",
+                        iteration = 1, FDR = 0.1, floorPDEG = 0.05)
TCC::INFO: Calculating normalization factors using DEGES
TCC::INFO: (iDEGES pipeline : deseq - [ deseq - deseq ] X 1 )
TCC::INFO: Done.
> tcc <- estimateDE(tcc, test.method = "deseq", FDR = 0.1)
TCC::INFO: Identifying DE genes using deseq ...
TCC::INFO: Done.
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>