R Graphical Manual

Browse All

Last data update: 2014.03.03

R: The "MSnSet" Class for MS Proteomics Expression Data and...

MSnSet-class

R Documentation

The "MSnSet" Class for MS Proteomics Expression Data and Meta-Data

Description

The MSnSet holds quantified expression data for MS proteomics data and the experimental meta-data. The MSnSet class is derived from the "eSet" class and mimics the "ExpressionSet" class classically used for microarray data.

Objects from the Class

The constructor MSnSet(exprs, fData, pData) can be used to create MSnSet instances. Argument exprs is a matrix and fData and pData must be of clas data.frame or "AnnotatedDataFrame" and all must meet the dimensions and name validity constrains.

Objects can also be created by calls of the form new("MSnSet", exprs, ...). See also "ExpressionSet" for helpful information. Expression data produced from other softwares can thus make use of this standardized data container to benefit R and Bioconductor packages. Importer functions will be developed to stream-line the generation of "MSnSet" instances from third-party software.

A coercion method is also available to transform an IBSpectra object (names x) from the isobar package into an MSnSet: as(x, "MSnSet").

In the frame of the MSnbase package, MSnSet instances can be generated from "MSnExp" experiments using the quantify method).

Slots

qual:: Object of class "data.frame" that records peaks data for each of the reporter ions to be used as quality metrics.
processingData:: Object of class "MSnProcess" that records all processing.
assayData:: Object of class "assayData" containing a matrix with equal with column number equal to nrow(phenoData). assayData must contain a matrix exprs with rows represening features (e.g., reporters ions) and columns representing samples. See the "AssayData" class, exprs and assayData accessor for more details. This slot in indirectly inherited from "eSet".
phenoData:: Object of class "AnnotatedDataFrame" containing experimenter-supplied variables describing sample (i.e the individual tags for an labelled MS experiment) (indireclty inherited from "eSet"). See phenoData and the "eSet" class for more details.
featureData:: Object of class "AnnotatedDataFrame" containing variables describing features (spectra in our case), e.g. identificaiton data, peptide sequence, identification score,... (inherited indirectly from "eSet"). See featureData and the "eSet" class for more details.
experimentData:: Object of class "MIAPE", containing details of experimental methods (inherited from "eSet"). See experimentData and the "eSet" class for more details.
annotation:: not used here.
protocolData:: Object of class "AnnotatedDataFrame" containing equipment-generated variables (inherited indirectly from "eSet"). See protocolData and the "eSet" class for more details.
.__classVersion__:: Object of class "Versions" describing the versions of R, the Biobase package, "eSet", "pSet" and MSnSet of the current instance. Intended for developer use and debugging (inherited indirectly from "eSet").

Extends

Class "eSet", directly. Class "VersionedBiobase", by class "eSet", distance 2. Class "Versioned", by class "eSet", distance 3.

Methods

MSnSet specific methods or over-riding it's super-class are described below. See also more "eSet" for inherited methods.

dim: signature(x = "MSnSet"): Returns the dimensions of object's assay data, i.e the number of samples and the number of features.
fileNames: signature(object = "MSnSet"): Access file names in the processingData slot.
msInfo: signature(object = "MSnSet"): Prints the MIAPE-MS meta-data stored in the experimentData slot.
processingData: signature(object = "MSnSet"): Access the processingData slot.
show: signature(object = "MSnSet"): Displays object content as text.
qual: signature(object = "MSnSet"): Access the reporter ion peaks description.
purityCorrect: signature(object = "MSnSet", impurities = "matrix"): performs reporter ions purity correction. See purityCorrect documentation for more details.
normalise: signature(object = "MSnSet"): Performs MSnSet normalisation. See normalise for more details.
t: signature(x = "MSnSet"): Returns a transposed MSnSet object where features are now aligned along columns and samples along rows and the phenoData and featureData slots have been swapped. The protocolData slot is always dropped.
as(,"ExpressionSet"): signature(x = "MSnSet"): Coerce object from MSnSet to ExpressionSet-class. The experimentData slot is converted to a MIAME instance. It is also possible to coerce an ExpressionSet to and MSnSet, in which case the experimentData slot is newly initialised.
as(,"data.frame"): signature(x = "MSnSet"): Coerce object from MSnSet to data.frame. The MSnSet is transposed and the PhenoData slot is appended. See also ms2df below.

write.exprs: signature(x = "MSnSet")

Writes expression values to a tab-separated file (default is tmp.txt). The fDataCols parameter can be used to specify which featureData columns (as column names, column number or logical) to append on the right of the expression matrix. The following arguments are the same as write.table.

combine: signature(x = "MSnSet", y = "MSnSet", ...)

Combines 2 or more MSnSet instances according to their feature names. Note that the qual slot and the processing information are silently dropped.

topN: signature(object = "MSnSet", groupBy, n = 3, fun, ...)

Selects the n most intense features (typically peptides or spectra) out of all available for each set defined by groupBy (typically proteins) and creates a new instance of class MSnSet. If less than n features are available, all are selected. The ncol(object) features are summerised using fun (default is sum) prior to be ordered in decreasing order. Additional parameters can be passed to fun through ..., for instance to control the behaviour of topN in case of NA values. Note that the qual slot and the processing information are silently dropped. (Works also with matrix instances.)

See also the nQuants function to retrieve the actual number of retained peptides out of n.

A complete use case using topN and nQuants is detailed in the synapter package vignette.

filterNA: signature(object = "MSnSet", pNA = "numeric", pattern = "character", droplevels = "logical")

This method subsets object by removing features that have (strictly) more than pNA percent of NA values. Default pNA is 0, which removes any feature that exhibits missing data. The method can also be used with a character pattern composed of 0 or 1 characters only. A 0 represent a column/sample that is allowed a missing values, while columns/samples with and 1 must not have NAs.

This method also accepts matrix instances. droplevels defines whether unused levels in the feature meta-data ought to be lost. Default is TRUE. See the droplevels method below.

See also the is.na.MSnSet and plotNA methods for missing data exploration.

filterZero: signature(object = "MSnSet", pNA = "numeric", pattern = "character", droplevels = "logical")

As filterNA, but for zeros.

log: signature(object = "MSnSet", base = "numeric")

Log transforms exprs(object) using base::log. base (defaults is e='exp(1)') must be a positive or complex number, the base with respect to which logarithms are computed.

droplevels: signature(x = "MSnSet", ...)

Drops the unused factor levels in the featureData slot. See droplevels for details.

exprsToRatios: signature(object = "MSnSet", log = "logical")

calculates all possible ratios between object's columns/samples. See exprsToRatios for more details.

impute: signature(object = "MSnSet", ...)

Performs data imputation on the MSnSet object. See impute for more details.

trimws: signature(object = "MSnSet", ...)

Trim leading and/or trailing white spaces in the feature data slot. Also available for data.frame objects. See ?base::trimws for details.

Additional accessors for the experimental metadata (experimentData slot) are defined. See "MIAPE" for details.

Plotting

meanSdPlot: signature(object = "MSnSet")

Plots row standard deviations versus row means. See meanSdPlot (vsn package) for more details.

image: signature(x = "MSnSet", facetBy = "character", sOrderBy = "character", legend = "character", low = "character", high = "character", fnames = "logical", nmax = "numeric")

Produces an heatmap of expression values in the x object. Simple horizontal facetting is enabled by passing a single character as facetBy. Arbitrary facetting can be performed manually by saving the return value of the method (see example below). Re-ordering of the samples is possible by providing the name of a phenotypic variable to sOrderBy. The title of the legend can be set with legend and the colours with the low and high arguments. If any negative value is detected in the data, the values are considered as log fold-changes and a divergent colour scale is used. Otherwise, a gradient from low to high is used. To scale the quantitative data in x prior to plotting, please see the scale method.

When there are more than nmax (default is 50) features/rows, these are not printed. This behaviour can be controlled by setting fnames to TRUE (always print) or FALSE (never print). See examples below.

The code is based on Vlad Petyuk's vp.misc::image_msnset. The previous version of this method is still available through the image2 function.

plotNA: signature(object = "MSnSet", pNA = "numeric")

Plots missing data for an MSnSet instance. pNA is a numeric of length 1 that specifies the percentage of accepted missing data values per features. This value will be highlighted with a point on the figure, illustrating the overall percentage of NA values in the full data set and the number of proteins retained. Default is 1/2. See also plotNA.

MAplot: signature(object = "MSnSet", log.it = "logical", base = "numeric", ...)

Produces MA plots (Ratio as a function of average intensity) for the samples in object. If ncol(object) == 2, then one MA plot is produced using the ma.plot function from the affy package. If object has more than 2 columns, then mva.pairs. log.it specifies is the data should be log-transformed (default is TRUE) using base. Further ... arguments will be passed to the respective functions.

addIdentificationData: signature(object = "MSnSet", ...): Adds identification data to a MSnSet instance. See addIdentificationData documentation for more details and examples.
removeNoId: signature(object = "MSnSet", fcol = "pepseq", keep = NULL): Removes non-identified features. See removeNoId documentation for more details and examples.
removeMultipleAssignment: signature(object = "MSnSet", fcol = "nprot"): Removes protein groups with more than one member. The latter is defined by extracting a feature variable (default is "nprot").
idSummary: signature(object = "MSnSet", ...): Prints a summary that lists the percentage of identified features per file (called coverage).

Functions

updateFvarLabels: signature(object, label, sep)

This function updates object's featureData variable labels by appending label. By default, label is the variable name and the separator sep is ..

updateSampleNames: signature(object, label, sep)

This function updates object's sample names by appending label. By default, label is the variable name and the separator sep is ..

updateFeatureNames: signature(object, label, sep)

This function updates object's feature names by appending label. By default, label is the variable name and the separator sep is ..

ms2df: signature(x, fcols)

Coerces the MSnSet instance to a data.frame. The direction of the data is retained and the feature variable labels that match fcol are appended to the expression values. See also as(x, "data.frame") above.

Author(s)

Laurent Gatto <lg390@cam.ac.uk>

Examples

data(msnset)
msnset <- msnset[10:15]

exprs(msnset)[1, c(1, 4)] <- NA
exprs(msnset)[2, c(1, 2)] <- NA
is.na(msnset)
featureNames(filterNA(msnset, pNA = 1/4))
featureNames(filterNA(msnset, pattern = "0110"))

M <- matrix(rnorm(12), 4)
pd <- data.frame(otherpdata = letters[1:3])
fd <- data.frame(otherfdata = letters[1:4])
x0 <- MSnSet(M, fd, pd)
sampleNames(x0)

M <- matrix(rnorm(12), 4)
colnames(M) <- LETTERS[1:3]
rownames(M) <- paste0("id", LETTERS[1:4])
pd <- data.frame(otherpdata = letters[1:3])
rownames(pd) <- colnames(M)
fd <- data.frame(otherfdata = letters[1:4])
rownames(fd) <- rownames(M)
x <- MSnSet(M, fd, pd)
sampleNames(x)


## Visualisation

library(pRolocdata)
data(dunkley2006)
image(dunkley2006)
## Changing colours
image(dunkley2006, high = "darkgreen")
image(dunkley2006, high = "darkgreen", low = "yellow")
## Forcing feature names
image(dunkley2006, fnames = TRUE)
## Facetting
image(dunkley2006, facetBy = "replicate")
p <- image(dunkley2006)
library("ggplot2") ## for facet_grid
p + facet_grid(replicate ~ membrane.prep, scales = 'free', space = 'free')
p + facet_grid(markers ~ replicate)
## Fold-changes
dd <- dunkley2006
exprs(dd) <- exprs(dd) - 0.25
image(dd)
image(dd, low = "green", high = "red")
## Feature names are displayed by default for smaller data
dunkley2006 <- dunkley2006[1:25, ]
image(dunkley2006)
image(dunkley2006, legend = "hello")

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(MSnbase)
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: mzR
Loading required package: Rcpp
Loading required package: BiocParallel
Loading required package: ProtGenerics

This is MSnbase version 1.20.7 
  Read '?MSnbase' and references therein for information
  about the package and how to get started.


Attaching package: 'MSnbase'

The following object is masked from 'package:stats':

    smooth

The following object is masked from 'package:base':

    trimws

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/MSnbase/MSnSet-class.Rd_%03d_medium.png", width=480, height=480)
> ### Name: MSnSet-class
> ### Title: The "MSnSet" Class for MS Proteomics Expression Data and
> ###   Meta-Data
> ### Aliases: MSnSet-class class:MSnSet MSnSet exprs,MSnSet-method
> ###   dim,MSnSet-method fileNames,MSnSet-method msInfo,MSnSet-method
> ###   processingData,MSnSet-method qual,MSnSet-method qual
> ###   show,MSnSet-method purityCorrect,MSnSet-method
> ###   purityCorrect,MSnSet,matrix-method meanSdPlot,MSnSet-method t.MSnSet
> ###   [,MSnSet-method [,MSnSet,ANY,ANY-method [,MSnSet,ANY,ANY,ANY-method
> ###   as.ExpressionSet.MSnSet as.MSnSet.ExpressionSet as.data.frame.MSnSet
> ###   ms2df coerce,IBSpectra,MSnSet-method
> ###   coerce,MSnSet,ExpressionSet-method coerce,ExpressionSet,MSnSet-method
> ###   coerce,MSnSet,data.frame-method write.exprs write.exprs,MSnSet-method
> ###   experimentData<-,MSnSet,MIAPE-method combine,MSnSet,MSnSet-method
> ###   topN,MSnSet,MSnSet-method topN,MSnSet-method topN,matrix-method topN
> ###   filterNA,MSnSet-method filterNA,matrix-method filterNA
> ###   filterZero,MSnSet-method filterZero,matrix-method filterZero
> ###   log,MSnSet-method image,MSnSet-method image2 MAplot,MSnSet-method
> ###   addIdentificationData,MSnSet,character-method
> ###   addIdentificationData,MSnSet,mzIDClasses-method
> ###   addIdentificationData,MSnSet,mzID-method
> ###   addIdentificationData,MSnSet,mzIDCollection-method
> ###   addIdentificationData,MSnSet,data.frame-method
> ###   removeNoId,MSnSet-method removeMultipleAssignment-method
> ###   removeMultipleAssignment,MSnSet-method removeMultipleAssignment
> ###   idSummary,MSnSet-method idSummary trimws trimws,MSnSet-method
> ###   trimws,data.frame-method exptitle,MSnSet-method
> ###   expemail,MSnSet-method ionSource,MSnSet-method analyser,MSnSet-method
> ###   analyzer,MSnSet-method detectorType,MSnSet-method
> ###   description,MSnSet-method updateFvarLabels updateSampleNames
> ###   updateFeatureNames droplevels.MSnSet
> ### Keywords: classes
> 
> ### ** Examples
> 
> data(msnset)
> msnset <- msnset[10:15]
> 
> exprs(msnset)[1, c(1, 4)] <- NA
> exprs(msnset)[2, c(1, 2)] <- NA
> is.na(msnset)
    iTRAQ4.114 iTRAQ4.115 iTRAQ4.116 iTRAQ4.117
X18       TRUE      FALSE      FALSE       TRUE
X19       TRUE       TRUE      FALSE      FALSE
X2       FALSE      FALSE      FALSE      FALSE
X20      FALSE      FALSE      FALSE      FALSE
X21      FALSE      FALSE      FALSE      FALSE
X22      FALSE      FALSE      FALSE      FALSE
> featureNames(filterNA(msnset, pNA = 1/4))
[1] "X2"  "X20" "X21" "X22"
> featureNames(filterNA(msnset, pattern = "0110"))
[1] "X18" "X2"  "X20" "X21" "X22"
> 
> M <- matrix(rnorm(12), 4)
> pd <- data.frame(otherpdata = letters[1:3])
> fd <- data.frame(otherfdata = letters[1:4])
> x0 <- MSnSet(M, fd, pd)
> sampleNames(x0)
[1] "1" "2" "3"
> 
> M <- matrix(rnorm(12), 4)
> colnames(M) <- LETTERS[1:3]
> rownames(M) <- paste0("id", LETTERS[1:4])
> pd <- data.frame(otherpdata = letters[1:3])
> rownames(pd) <- colnames(M)
> fd <- data.frame(otherfdata = letters[1:4])
> rownames(fd) <- rownames(M)
> x <- MSnSet(M, fd, pd)
> sampleNames(x)
[1] "A" "B" "C"
> 
> 
> ## Visualisation
> 
> library(pRolocdata)

This is pRolocdata version 1.10.0.
Use 'pRolocdata()' to list available data sets.
> data(dunkley2006)
> image(dunkley2006)
> ## Changing colours
> image(dunkley2006, high = "darkgreen")
> image(dunkley2006, high = "darkgreen", low = "yellow")
> ## Forcing feature names
> image(dunkley2006, fnames = TRUE)
> ## Facetting
> image(dunkley2006, facetBy = "replicate")
> p <- image(dunkley2006)
> library("ggplot2") ## for facet_grid
> p + facet_grid(replicate ~ membrane.prep, scales = 'free', space = 'free')
> p + facet_grid(markers ~ replicate)
> ## Fold-changes
> dd <- dunkley2006
> exprs(dd) <- exprs(dd) - 0.25
> image(dd)
> image(dd, low = "green", high = "red")
> ## Feature names are displayed by default for smaller data
> dunkley2006 <- dunkley2006[1:25, ]
> image(dunkley2006)
> image(dunkley2006, legend = "hello")
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>