Last data update: 2014.03.03

R: Tidying methods for DESeq2 DESeqDataSet objects
DESeq2_tidiersR Documentation

Tidying methods for DESeq2 DESeqDataSet objects

Description

This reshapes a DESeq2 expressionset object into a tidy format. If the dataset contains hypothesis test results (p-values and estimates), this summarizes one row per gene per possible contrast.

Usage

## S3 method for class 'DESeqDataSet'
tidy(x, colData = FALSE, intercept = FALSE, ...)

## S3 method for class 'DESeqResults'
tidy(x, ...)

Arguments

x

DESeqDataSet object

colData

whether colData should be included in the tidied output for those in the DESeqDataSet object. If dataset includes hypothesis test results, this is ignored

intercept

whether to include hypothesis test results from the (Intercept) term. If dataset does not include hypothesis testing, this is ignored

...

extra arguments (not used)

Details

colDat=TRUE adds covariates from colData to the data frame.

Value

If the dataset contains results (p-values and log2 fold changes), the result is a data frame with the columns

term

The contrast being tested, as given to results

gene

gene ID

baseMean

mean abundance level

estimate

estimated log2 fold change

stderror

standard error in log2 fold change estimate

statistic

test statistic

p.value

p-value

p.adjusted

adjusted p-value

If the dataset does not contain results (DESeq has not been run on it), tidy defaults to tidying the counts in the dataset:

gene

gene ID

sample

sample ID

count

number of reads in this gene in this sample

If colData = TRUE, it also merges this with the columns present in colData(x).

Examples


# From DESeq2 documentation

if (require("DESeq2")) {
    dds <- makeExampleDESeqDataSet(betaSD = 1)

    tidy(dds)
    # With design included
    tidy(dds, colData=TRUE)

    # add a noise confounding effect
    colData(dds)$noise <- rnorm(nrow(colData(dds)))
    design(dds) <- (~ condition + noise)

    # perform differential expression tests
    ddsres <- DESeq(dds, test = "Wald")
    # now results are per-gene, per-term
    tidied <- tidy(ddsres)
    tidied

    if (require("ggplot2")) {
        ggplot(tidied, aes(p.value)) + geom_histogram(binwidth = .05) +
            facet_wrap(~ term, scale = "free_y")
    }
}

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(biobroom)
Loading required package: broom
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/biobroom/DESeq2_tidiers.Rd_%03d_medium.png", width=480, height=480)
> ### Name: DESeq2_tidiers
> ### Title: Tidying methods for DESeq2 DESeqDataSet objects
> ### Aliases: DESeq2_tidiers tidy.DESeqDataSet tidy.DESeqResults
> 
> ### ** Examples
> 
> 
> # From DESeq2 documentation
> 
> if (require("DESeq2")) {
+     dds <- makeExampleDESeqDataSet(betaSD = 1)
+ 
+     tidy(dds)
+     # With design included
+     tidy(dds, colData=TRUE)
+ 
+     # add a noise confounding effect
+     colData(dds)$noise <- rnorm(nrow(colData(dds)))
+     design(dds) <- (~ condition + noise)
+ 
+     # perform differential expression tests
+     ddsres <- DESeq(dds, test = "Wald")
+     # now results are per-gene, per-term
+     tidied <- tidy(ddsres)
+     tidied
+ 
+     if (require("ggplot2")) {
+         ggplot(tidied, aes(p.value)) + geom_histogram(binwidth = .05) +
+             facet_wrap(~ term, scale = "free_y")
+     }
+ }
Loading required package: DESeq2
Loading required package: S4Vectors
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit


Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
Loading required package: ggplot2
Warning message:
Removed 4 rows containing non-finite values (stat_bin). 
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>