Last data update: 2014.03.03

R: SummarizedExperiment objects
SummarizedExperiment-classR Documentation

SummarizedExperiment objects

Description

The SummarizedExperiment class is a matrix-like container where rows represent features of interest (e.g. genes, transcripts, exons, etc...) and columns represent samples (with sample data summarized as a DataFrame). A SummarizedExperiment object contains one or more assays, each represented by a matrix-like object of numeric or other mode.

Note that SummarizedExperiment is the parent of the RangedSummarizedExperiment class which means that all the methods documented below also work on a RangedSummarizedExperiment object.

Usage


## Constructor

# See ?RangedSummarizedExperiment for the constructor function.

## Accessors

assayNames(x, ...)
assayNames(x, ...) <- value
assays(x, ..., withDimnames=TRUE)
assays(x, ..., withDimnames=TRUE) <- value
assay(x, i, ...)
assay(x, i, ...) <- value
rowData(x, ...)
rowData(x, ...) <- value
colData(x, ...)
colData(x, ...) <- value
#dim(x)
#dimnames(x)
#dimnames(x) <- value

## Quick colData access

## S4 method for signature 'SummarizedExperiment'
x$name
## S4 replacement method for signature 'SummarizedExperiment'
x$name <- value
## S4 method for signature 'SummarizedExperiment,ANY,missing'
x[[i, j, ...]]
## S4 replacement method for signature 'SummarizedExperiment,ANY,missing'
x[[i, j, ...]] <- value

## Subsetting

## S4 method for signature 'SummarizedExperiment'
x[i, j, ..., drop=TRUE]
## S4 replacement method for signature 'SummarizedExperiment,ANY,ANY,SummarizedExperiment'
x[i, j] <- value

## Combining 

## S4 method for signature 'SummarizedExperiment'
cbind(..., deparse.level=1)
## S4 method for signature 'SummarizedExperiment'
rbind(..., deparse.level=1)

Arguments

x

A SummarizedExperiment object.

...

For assay, ... may contain withDimnames, which is forwarded to assays.

For rowData, arguments passed thru ... are forwarded to mcols.

For cbind, rbind, ... contains SummarizedExperiment objects to be combined.

For other accessors, ignored.

i, j

For assay, assay<-, i is an integer or numeric scalar; see ‘Details’ for additional constraints.

For [,SummarizedExperiment, [,SummarizedExperiment<-, i, j are subscripts that can act to subset the rows and columns of x, that is the matrix elements of assays.

For [[,SummarizedExperiment, [[<-,SummarizedExperiment, i is a scalar index (e.g., character(1) or integer(1)) into a column of colData.

name

A symbol representing the name of a column of colData.

withDimnames

A logical(1), indicating whether dimnames should be applied to extracted assay elements. Setting withDimnames=FALSE increases the speed and memory efficiency with which assays are extracted. withDimnames=TRUE in the getter assays<- allows efficient complex assignments (e.g., updating names of assays, names(assays(x, withDimnames=FALSE)) = ... is more efficient than names(assays(x)) = ...); it does not influence actual assignment of dimnames to assays.

drop

A logical(1), ignored by these methods.

value

An object of a class specified in the S4 method signature or as outlined in ‘Details’.

deparse.level

See ?base::cbind for a description of this argument.

Details

The SummarizedExperiment class is meant for numeric and other data types derived from a sequencing experiment. The structure is rectangular like a matrix, but with additional annotations on the rows and columns, and with the possibility to manage several assays simultaneously.

The rows of a SummarizedExperiment object represent features of interest. Information about these features is stored in a DataFrame object, accessible using the function rowData. The DataFrame must have as many rows as there are rows in the SummarizedExperiment object, with each row of the DataFrame providing information on the feature in the corresponding row of the SummarizedExperiment object. Columns of the DataFrame represent different attributes of the features of interest, e.g., gene or transcript IDs, etc.

Each column of a SummarizedExperiment object represents a sample. Information about the samples are stored in a DataFrame, accessible using the function colData, described below. The DataFrame must have as many rows as there are columns in the SummarizedExperiment object, with each row of the DataFrame providing information on the sample in the corresponding column of the SummarizedExperiment object. Columns of the DataFrame represent different sample attributes, e.g., tissue of origin, etc. Columns of the DataFrame can themselves be annotated (via the mcols function). Column names typically provide a short identifier unique to each sample.

A SummarizedExperiment object can also contain information about the overall experiment, for instance the lab in which it was conducted, the publications with which it is associated, etc. This information is stored as a list object, accessible using the metadata function. The form of the data associated with the experiment is left to the discretion of the user.

The SummarizedExperiment container is appropriate for matrix-like data. The data are accessed using the assays function, described below. This returns a SimpleList object. Each element of the list must itself be a matrix (of any mode) and must have dimensions that are the same as the dimensions of the SummarizedExperiment in which they are stored. Row and column names of each matrix must either be NULL or match those of the SummarizedExperiment during construction. It is convenient for the elements of SimpleList of assays to be named.

Constructor

SummarizedExperiment instances are constructed using the SummarizedExperiment function documented in ?RangedSummarizedExperiment.

Accessors

In the following code snippets, x is a SummarizedExperiment object.

assays(x), assays(x) <- value:

Get or set the assays. value is a list or SimpleList, each element of which is a matrix with the same dimensions as x.

assay(x, i), assay(x, i) <- value:

A convenient alternative (to assays(x)[[i]], assays(x)[[i]] <- value) to get or set the ith (default first) assay element. value must be a matrix of the same dimension as x, and with dimension names NULL or consistent with those of x.

assayNames(x), assayNames(x) <- value:

Get or set the names of assay() elements.

rowData(x), rowData(x) <- value:

Get or set the row data. value is a DataFrame object. Row names of value must be NULL or consistent with the existing row names of x.

colData(x), colData(x) <- value:

Get or set the column data. value is a DataFrame object. Row names of value must be NULL or consistent with the existing column names of x.

metadata(x), metadata(x) <- value:

Get or set the experiment data. value is a list with arbitrary content.

dim(x):

Get the dimensions (features of interest x samples) of the SummarizedExperiment.

dimnames(x), dimnames(x) <- value:

Get or set the dimension names. value is usually a list of length 2, containing elements that are either NULL or vectors of appropriate length for the corresponding dimension. value can be NULL, which removes dimension names. This method implies that rownames, rownames<-, colnames, and colnames<- are all available.

Subsetting

In the code snippets below, x is a SummarizedExperiment object.

x[i,j], x[i,j] <- value:

Create or replace a subset of x. i, j can be numeric, logical, character, or missing. value must be a SummarizedExperiment object with dimensions, dimension names, and assay elements consistent with the subset x[i,j] being replaced.

Additional subsetting accessors provide convenient access to colData columns

x$name, x$name <- value

Access or replace column name in x.

x[[i, ...]], x[[i, ...]] <- value

Access or replace column i in x.

Combining

In the code snippets below, ... are SummarizedExperiment objects to be combined.

cbind(...):

cbind combines objects with the same features of interest but different samples (columns in assays). The colnames in colData(SummarizedExperiment) must match or an error is thrown. Duplicate columns of rowData(SummarizedExperiment) must contain the same data.

Data in assays are combined by name matching; if all assay names are NULL matching is by position. A mixture of names and NULL throws an error.

metadata from all objects are combined into a list with no name checking.

rbind(...):

rbind combines objects with the same samples but different features of interest (rows in assays). The colnames in rowData(SummarizedExperiment) must match or an error is thrown. Duplicate columns of colData(SummarizedExperiment) must contain the same data.

Data in assays are combined by name matching; if all assay names are NULL matching is by position. A mixture of names and NULL throws an error.

metadata from all objects are combined into a list with no name checking.

Implementation and Extension

This section contains advanced material meant for package developers.

SummarizedExperiment is implemented as an S4 class, and can be extended in the usual way, using contains="SummarizedExperiment" in the new class definition.

In addition, the representation of the assays slot of SummarizedExperiment is as a virtual class Assays. This allows derived classes (contains="Assays") to easily implement alternative requirements for the assays, e.g., backed by file-based storage like NetCDF or the ff package, while re-using the existing SummarizedExperiment class without modification. See Assays for more information.

The current assays slot is implemented as a reference class that has copy-on-change semantics. This means that modifying non-assay slots does not copy the (large) assay data, and at the same time the user is not surprised by reference-based semantics. Updates to non-assay slots are very fast; updating the assays slot itself can be 5x or more faster than with an S4 instance in the slot. One useful technique when working with assay or assays function is use of the withDimnames=FALSE argument, which benefits speed and memory use by not copying dimnames from the row- and colData elements to each assay.

Author(s)

Martin Morgan, mtmorgan@fhcrc.org

See Also

  • RangedSummarizedExperiment objects.

  • DataFrame, SimpleList, and Annotated objects in the S4Vectors package.

  • The metadata and mcols accessors in the S4Vectors package.

Examples

nrows <- 200; ncols <- 6
counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3),
                     row.names=LETTERS[1:6])
se0 <- SummarizedExperiment(assays=SimpleList(counts=counts),
                            colData=colData)
se0
dim(se0)
dimnames(se0)
assayNames(se0)
head(assay(se0))
assays(se0) <- endoapply(assays(se0), asinh)
head(assay(se0))

rowData(se0)
colData(se0)

se0[, se0$Treatment == "ChIP"]

## cbind() combines objects with the same features of interest
## but different samples:
se1 <- se0
se2 <- se1[,1:3]
colnames(se2) <- letters[1:ncol(se2)] 
cmb1 <- cbind(se1, se2)
dim(cmb1)
dimnames(cmb1)

## rbind() combines objects with the same samples but different
## features of interest:
se1 <- se0
se2 <- se1[1:50,]
rownames(se2) <- letters[1:nrow(se2)] 
cmb2 <- rbind(se1, se2)
dim(cmb2)
dimnames(cmb2)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(SummarizedExperiment)
Loading required package: GenomicRanges
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/SummarizedExperiment/SummarizedExperiment-class.Rd_%03d_medium.png", width=480, height=480)
> ### Name: SummarizedExperiment-class
> ### Title: SummarizedExperiment objects
> ### Aliases: class:SummarizedExperiment SummarizedExperiment-class
> ###   class:SummarizedExperiment0 SummarizedExperiment0-class
> ###   SummarizedExperiment0 length,SummarizedExperiment-method
> ###   names,SummarizedExperiment-method names<-,SummarizedExperiment-method
> ###   exptData exptData,SummarizedExperiment-method exptData<-
> ###   exptData<-,SummarizedExperiment-method rowData
> ###   rowData,SummarizedExperiment-method rowData<-
> ###   rowData<-,SummarizedExperiment-method colData
> ###   colData,SummarizedExperiment-method colData<-
> ###   colData<-,SummarizedExperiment,DataFrame-method assays
> ###   assays,SummarizedExperiment-method assays<-
> ###   assays<-,SummarizedExperiment,SimpleList-method
> ###   assays<-,SummarizedExperiment,list-method assay
> ###   assay,SummarizedExperiment,missing-method
> ###   assay,SummarizedExperiment,numeric-method
> ###   assay,SummarizedExperiment,character-method assay<-
> ###   assay<-,SummarizedExperiment,missing-method
> ###   assay<-,SummarizedExperiment,numeric-method
> ###   assay<-,SummarizedExperiment,character-method assayNames
> ###   assayNames,SummarizedExperiment-method assayNames<-
> ###   assayNames<-,SummarizedExperiment,character-method
> ###   dim,SummarizedExperiment-method dimnames,SummarizedExperiment-method
> ###   dimnames<-,SummarizedExperiment,list-method
> ###   dimnames<-,SummarizedExperiment,NULL-method
> ###   [,SummarizedExperiment-method [,SummarizedExperiment,ANY-method
> ###   [<-,SummarizedExperiment,ANY,ANY,SummarizedExperiment-method
> ###   extractROWS,SummarizedExperiment,ANY-method
> ###   replaceROWS,SummarizedExperiment-method
> ###   [[,SummarizedExperiment,ANY,missing-method
> ###   [[<-,SummarizedExperiment,ANY,missing-method
> ###   $,SummarizedExperiment-method $<-,SummarizedExperiment-method
> ###   show,SummarizedExperiment-method rbind,SummarizedExperiment-method
> ###   cbind,SummarizedExperiment-method
> 
> ### ** Examples
> 
> nrows <- 200; ncols <- 6
> counts <- matrix(runif(nrows * ncols, 1, 1e4), nrows)
> colData <- DataFrame(Treatment=rep(c("ChIP", "Input"), 3),
+                      row.names=LETTERS[1:6])
> se0 <- SummarizedExperiment(assays=SimpleList(counts=counts),
+                             colData=colData)
> se0
class: SummarizedExperiment 
dim: 200 6 
metadata(0):
assays(1): counts
rownames: NULL
rowData names(0):
colnames(6): A B ... E F
colData names(1): Treatment
> dim(se0)
[1] 200   6
> dimnames(se0)
[[1]]
NULL

[[2]]
[1] "A" "B" "C" "D" "E" "F"

> assayNames(se0)
[1] "counts"
> head(assay(se0))
            A        B        C         D        E        F
[1,] 1374.414 5119.840 1290.580 4672.2896 5063.631 8033.509
[2,] 2481.446 3753.460 1283.185 4531.7479 1441.172 3347.901
[3,] 7308.335 3621.769 9288.832  127.9674  164.534 9444.514
[4,] 3221.948 3614.171 1670.516 4790.7024 5210.478 1658.968
[5,] 3983.192 2270.253 4471.825 6327.0155 3116.102  508.283
[6,] 9453.568 7094.192 5767.848 4973.8048  610.231 2907.822
> assays(se0) <- endoapply(assays(se0), asinh)
> head(assay(se0))
            A        B        C        D        E        F
[1,] 7.918930 9.234026 7.855995 9.142552 9.222986 9.684524
[2,] 8.509744 8.923580 7.850248 9.112010 7.966359 8.809236
[3,] 9.589918 8.887865 9.829715 5.544938 5.796274 9.846337
[4,] 8.770889 8.885765 8.114035 9.167580 9.251574 8.107098
[5,] 8.982986 8.420794 9.098699 9.445731 8.737485 6.924186
[6,] 9.847295 9.560179 9.353201 9.205088 7.106985 8.668307
> 
> rowData(se0)
DataFrame with 200 rows and 0 columns
> colData(se0)
DataFrame with 6 rows and 1 column
    Treatment
  <character>
A        ChIP
B       Input
C        ChIP
D       Input
E        ChIP
F       Input
> 
> se0[, se0$Treatment == "ChIP"]
class: SummarizedExperiment 
dim: 200 3 
metadata(0):
assays(1): counts
rownames: NULL
rowData names(0):
colnames(3): A C E
colData names(1): Treatment
> 
> ## cbind() combines objects with the same features of interest
> ## but different samples:
> se1 <- se0
> se2 <- se1[,1:3]
> colnames(se2) <- letters[1:ncol(se2)] 
> cmb1 <- cbind(se1, se2)
> dim(cmb1)
[1] 200   9
> dimnames(cmb1)
[[1]]
NULL

[[2]]
[1] "A" "B" "C" "D" "E" "F" "a" "b" "c"

> 
> ## rbind() combines objects with the same samples but different
> ## features of interest:
> se1 <- se0
> se2 <- se1[1:50,]
> rownames(se2) <- letters[1:nrow(se2)] 
> cmb2 <- rbind(se1, se2)
> dim(cmb2)
[1] 250   6
> dimnames(cmb2)
[[1]]
  [1] ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  "" 
 [19] ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  "" 
 [37] ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  "" 
 [55] ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  "" 
 [73] ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  "" 
 [91] ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  "" 
[109] ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  "" 
[127] ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  "" 
[145] ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  "" 
[163] ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  "" 
[181] ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  ""  "" 
[199] ""  ""  "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p"
[217] "q" "r" "s" "t" "u" "v" "w" "x" "y" "z" NA  NA  NA  NA  NA  NA  NA  NA 
[235] NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA  NA 

[[2]]
[1] "A" "B" "C" "D" "E" "F"

> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>