Last data update: 2014.03.03

R: Collapse technical replicates in a RangedSummarizedExperiment...
collapseReplicatesR Documentation

Collapse technical replicates in a RangedSummarizedExperiment or DESeqDataSet

Description

Collapses the columns in object by summing within levels of a grouping factor groupby. The purpose of this function is to sum up read counts from technical replicates to create an object with a single column of read counts for each sample. Optionally renames the columns of returned object with the levels of the grouping factor. Note: this function is written very simply and can be easily altered to produce other behavior by examining the source code.

Usage

collapseReplicates(object, groupby, run, renameCols = TRUE)

Arguments

object

A RangedSummarizedExperiment or DESeqDataSet

groupby

a grouping factor, as long as the columns of object

run

optional, the names of each unique column in object. if provided, a new column runsCollapsed will be added to the colData which pastes together the names of run

renameCols

whether to rename the columns of the returned object using the levels of the grouping factor

Value

the object with as many columns as levels in groupby. This object has assay/count data which is summed from the various columns which are grouped together, and the colData is subset using the first column for each group in groupby.

Examples


dds <- makeExampleDESeqDataSet(m=12)

# make data with two technical replicates for three samples
dds$sample <- factor(sample(paste0("sample",rep(1:9, c(2,1,1,2,1,1,2,1,1)))))
dds$run <- paste0("run",1:12)

ddsColl <- collapseReplicates(dds, dds$sample, dds$run)

# examine the colData and column names of the collapsed data
colData(ddsColl)
colnames(ddsColl)

# check that the sum of the counts for "sample1" is the same
# as the counts in the "sample1" column in ddsColl
matchFirstLevel <- dds$sample == levels(dds$sample)[1]
stopifnot(all(rowSums(counts(dds[,matchFirstLevel])) == counts(ddsColl[,1])))

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(DESeq2)
Loading required package: S4Vectors
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit


Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/DESeq2/collapseReplicates.Rd_%03d_medium.png", width=480, height=480)
> ### Name: collapseReplicates
> ### Title: Collapse technical replicates in a RangedSummarizedExperiment or
> ###   DESeqDataSet
> ### Aliases: collapseReplicates
> 
> ### ** Examples
> 
> 
> dds <- makeExampleDESeqDataSet(m=12)
> 
> # make data with two technical replicates for three samples
> dds$sample <- factor(sample(paste0("sample",rep(1:9, c(2,1,1,2,1,1,2,1,1)))))
> dds$run <- paste0("run",1:12)
> 
> ddsColl <- collapseReplicates(dds, dds$sample, dds$run)
> 
> # examine the colData and column names of the collapsed data
> colData(ddsColl)
DataFrame with 9 rows and 4 columns
        condition   sample         run runsCollapsed
         <factor> <factor> <character>   <character>
sample1         A  sample1        run1     run1,run2
sample2         B  sample2       run10         run10
sample3         B  sample3        run7          run7
sample4         B  sample4        run8     run8,run9
sample5         A  sample5        run4          run4
sample6         B  sample6       run11         run11
sample7         A  sample7        run5    run5,run12
sample8         A  sample8        run6          run6
sample9         A  sample9        run3          run3
> colnames(ddsColl)
[1] "sample1" "sample2" "sample3" "sample4" "sample5" "sample6" "sample7"
[8] "sample8" "sample9"
> 
> # check that the sum of the counts for "sample1" is the same
> # as the counts in the "sample1" column in ddsColl
> matchFirstLevel <- dds$sample == levels(dds$sample)[1]
> stopifnot(all(rowSums(counts(dds[,matchFirstLevel])) == counts(ddsColl[,1])))
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>