R: The "MSnSet" Class for MS Proteomics Expression Data and...
MSnSet-class
R Documentation
The "MSnSet" Class for MS Proteomics Expression Data and Meta-Data
Description
The MSnSet holds quantified expression data for MS proteomics
data and the experimental meta-data.
The MSnSet class is derived from the
"eSet" class and mimics the
"ExpressionSet" class classically used for
microarray data.
Objects from the Class
The constructor MSnSet(exprs, fData, pData) can be used to
create MSnSet instances. Argument exprs is a
matrix and fData and pData must be of clas
data.frame or "AnnotatedDataFrame" and all
must meet the dimensions and name validity constrains.
Objects can also be created by calls of the form new("MSnSet",
exprs, ...). See also "ExpressionSet" for
helpful information. Expression data produced from other softwares
can thus make use of this standardized data container to benefit
R and Bioconductor packages. Importer functions will be
developed to stream-line the generation of "MSnSet" instances
from third-party software.
A coercion method is also available to transform an IBSpectra
object (names x) from the isobar package into an
MSnSet: as(x, "MSnSet").
In the frame of the MSnbase package, MSnSet instances
can be generated from "MSnExp" experiments using
the quantify method).
Slots
qual:
Object of class "data.frame" that records
peaks data for each of the reporter ions to be used as quality
metrics.
processingData:
Object of class
"MSnProcess" that records all processing.
assayData:
Object of class "assayData"
containing a matrix with equal with column number equal to
nrow(phenoData). assayData must contain a matrix
exprs with rows represening features (e.g., reporters ions)
and columns representing samples. See the "AssayData"
class, exprs and assayData accessor
for more details. This slot in indirectly inherited from
"eSet".
phenoData:
Object of class "AnnotatedDataFrame"
containing experimenter-supplied variables describing sample (i.e
the individual tags for an labelled MS experiment) (indireclty
inherited from "eSet"). See
phenoData and the "eSet"
class for more details.
featureData:
Object of class
"AnnotatedDataFrame" containing variables describing
features (spectra in our case), e.g. identificaiton data, peptide
sequence, identification score,... (inherited indirectly from
"eSet"). See
featureData and the "eSet"
class for more details.
experimentData:
Object of class
"MIAPE", containing details of experimental
methods (inherited from "eSet"). See
experimentData and the "eSet"
class for more details.
annotation:
not used here.
protocolData:
Object of class
"AnnotatedDataFrame" containing
equipment-generated variables (inherited indirectly from
"eSet"). See
protocolData and the "eSet"
class for more details.
.__classVersion__:
Object of class
"Versions" describing the versions of R,
the Biobase package, "eSet",
"pSet" and MSnSet of the
current instance. Intended for developer use and debugging (inherited
indirectly from "eSet").
Extends
Class "eSet", directly.
Class "VersionedBiobase", by class "eSet", distance 2.
Class "Versioned", by class "eSet", distance 3.
Methods
MSnSet specific methods or over-riding it's super-class are described
below. See also more "eSet" for
inherited methods.
dim
signature(x = "MSnSet"): Returns the dimensions of
object's assay data, i.e the number of samples and the number of
features.
fileNames
signature(object = "MSnSet"): Access file
names in the processingData slot.
msInfo
signature(object = "MSnSet"): Prints the MIAPE-MS
meta-data stored in the experimentData slot.
processingData
signature(object = "MSnSet"): Access the
processingData slot.
show
signature(object = "MSnSet"): Displays object
content as text.
qual
signature(object = "MSnSet"): Access the reporter
ion peaks description.
purityCorrect
signature(object = "MSnSet", impurities =
"matrix"): performs reporter ions purity correction. See
purityCorrect documentation for more details.
normalise
signature(object = "MSnSet"): Performs
MSnSet normalisation. See normalise for more
details.
t
signature(x = "MSnSet"): Returns a transposed
MSnSet object where features are now aligned along columns
and samples along rows and the phenoData and
featureData slots have been swapped. The
protocolData slot is always dropped.
as(,"ExpressionSet")
signature(x = "MSnSet"): Coerce
object from MSnSet to
ExpressionSet-class. The experimentData slot is
converted to a MIAME instance. It is also possible to
coerce an ExpressionSet to and MSnSet, in which case
the experimentData slot is newly initialised.
as(,"data.frame")
signature(x = "MSnSet"): Coerce
object from MSnSet to data.frame. The MSnSet
is transposed and the PhenoData slot is appended.
See also ms2df below.
write.exprs
signature(x = "MSnSet")
Writes expression values
to a tab-separated file (default is tmp.txt). The
fDataCols parameter can be used to specify which
featureData columns (as column names, column number or
logical) to append on the right of the expression matrix.
The following arguments are the same as write.table.
combine
signature(x = "MSnSet", y = "MSnSet", ...)
Combines
2 or more MSnSet instances according to their feature names.
Note that the qual slot and the processing information are
silently dropped.
topN
signature(object = "MSnSet", groupBy, n = 3, fun, ...)
Selects the n most intense features (typically peptides or
spectra) out of all available for each set defined by
groupBy (typically proteins) and creates a new instance of
class MSnSet. If less than n features are available,
all are selected. The ncol(object) features are summerised
using fun (default is sum) prior to be ordered in
decreasing order. Additional parameters can be passed to
fun through ..., for instance to control the
behaviour of topN in case of NA values.
Note that the qual slot and the processing information are
silently dropped.
(Works also with matrix instances.)
See also the nQuants function to retrieve the
actual number of retained peptides out of n.
A complete use case using topN and nQuants is
detailed in the synapter package vignette.
This method
subsets object by removing features that have (strictly)
more than pNA percent of NA values. Default pNA is
0, which removes any feature that exhibits missing data.
The method can also be used with a character pattern composed of
0 or 1 characters only. A 0 represent a
column/sample that is allowed a missing values, while
columns/samples with and 1 must not have NAs.
This method also accepts matrix
instances. droplevels defines whether unused levels in the
feature meta-data ought to be lost. Default is TRUE.
See the droplevels method below.
See also the is.na.MSnSet and plotNA
methods for missing data exploration.
Log
transforms exprs(object) using
base::log. base (defaults is e='exp(1)') must
be a positive or complex number, the base with respect to which
logarithms are computed.
droplevels
signature(x = "MSnSet", ...)
Drops the unused
factor levels in the featureData slot. See
droplevels for details.
exprsToRatios
signature(object = "MSnSet", log =
"logical")
calculates all possible ratios between
object's columns/samples.
See exprsToRatios for more details.
impute
signature(object = "MSnSet", ...)
Performs data imputation on the MSnSet object.
See impute for more details.
trimws
signature(object = "MSnSet", ...)
Trim leading and/or
trailing white spaces in the feature data slot. Also available for
data.frame objects. See ?base::trimws
for details.
Additional accessors for the experimental metadata
(experimentData slot) are defined. See
"MIAPE" for details.
Plotting
meanSdPlot
signature(object = "MSnSet")
Plots row
standard deviations versus row means. See
meanSdPlot (vsn package) for more details.
Produces an heatmap of expression values in the
x object. Simple horizontal facetting is enabled by
passing a single character as facetBy. Arbitrary
facetting can be performed manually by saving the return value
of the method (see example below). Re-ordering of the samples is
possible by providing the name of a phenotypic variable to
sOrderBy. The title of the legend can be set with
legend and the colours with the low and
high arguments. If any negative value is detected in the
data, the values are considered as log fold-changes and a
divergent colour scale is used. Otherwise, a gradient from low
to high is used. To scale the quantitative data in x
prior to plotting, please see the scale method.
When there are more than nmax (default is 50)
features/rows, these are not printed. This behaviour can be
controlled by setting fnames to TRUE (always
print) or FALSE (never print). See examples below.
The code is based on Vlad Petyuk's
vp.misc::image_msnset. The previous version of this
method is still available through the image2 function.
plotNA
signature(object = "MSnSet", pNA =
"numeric")
Plots missing data for an MSnSet instance. pNA is a
numeric of length 1 that specifies the percentage
of accepted missing data values per features. This value will be
highlighted with a point on the figure, illustrating the overall
percentage of NA values in the full data set and the number of
proteins retained. Default is 1/2. See also
plotNA.
MAplot
signature(object = "MSnSet", log.it = "logical",
base = "numeric", ...)
Produces MA plots (Ratio as a function
of average intensity) for the samples in object. If
ncol(object) == 2, then one MA plot is produced using the
ma.plot function from the affy package. If
object has more than 2 columns, then
mva.pairs. log.it specifies is the data
should be log-transformed (default is TRUE) using
base. Further ... arguments will be passed to the
respective functions.
addIdentificationData
signature(object = "MSnSet", ...):
Adds identification data to a MSnSet instance.
See addIdentificationData documentation for
more details and examples.
removeNoId
signature(object = "MSnSet", fcol =
"pepseq", keep = NULL): Removes non-identified features. See
removeNoId documentation for more details and
examples.
removeMultipleAssignment
signature(object = "MSnSet",
fcol = "nprot"): Removes protein groups with more than one
member. The latter is defined by extracting a feature variable
(default is "nprot").
idSummary
signature(object = "MSnSet", ...): Prints a
summary that lists the percentage of identified features per file
(called coverage).
Functions
updateFvarLabels
signature(object, label, sep)
This function updates object's featureData variable labels
by appending label. By default, label is the
variable name and the separator sep is ..
updateSampleNames
signature(object, label, sep)
This function updates object's sample names by appending
label. By default, label is the variable name and
the separator sep is ..
updateFeatureNames
signature(object, label, sep)
This function updates object's feature names by appending
label. By default, label is the variable name and
the separator sep is ..
ms2df
signature(x, fcols)
Coerces the MSnSet instance
to a data.frame. The direction of the data is retained and
the feature variable labels that match fcol are appended to
the expression values. See also as(x, "data.frame") above.
Author(s)
Laurent Gatto <lg390@cam.ac.uk>
See Also
"eSet", "ExpressionSet" and
quantify. MSnSet quantitation values and
annotation can be exported to a file with
write.exprs. See readMSnSet to
create and MSnSet using data available in a spreadsheet or
data.frame.
Examples
data(msnset)
msnset <- msnset[10:15]
exprs(msnset)[1, c(1, 4)] <- NA
exprs(msnset)[2, c(1, 2)] <- NA
is.na(msnset)
featureNames(filterNA(msnset, pNA = 1/4))
featureNames(filterNA(msnset, pattern = "0110"))
M <- matrix(rnorm(12), 4)
pd <- data.frame(otherpdata = letters[1:3])
fd <- data.frame(otherfdata = letters[1:4])
x0 <- MSnSet(M, fd, pd)
sampleNames(x0)
M <- matrix(rnorm(12), 4)
colnames(M) <- LETTERS[1:3]
rownames(M) <- paste0("id", LETTERS[1:4])
pd <- data.frame(otherpdata = letters[1:3])
rownames(pd) <- colnames(M)
fd <- data.frame(otherfdata = letters[1:4])
rownames(fd) <- rownames(M)
x <- MSnSet(M, fd, pd)
sampleNames(x)
## Visualisation
library(pRolocdata)
data(dunkley2006)
image(dunkley2006)
## Changing colours
image(dunkley2006, high = "darkgreen")
image(dunkley2006, high = "darkgreen", low = "yellow")
## Forcing feature names
image(dunkley2006, fnames = TRUE)
## Facetting
image(dunkley2006, facetBy = "replicate")
p <- image(dunkley2006)
library("ggplot2") ## for facet_grid
p + facet_grid(replicate ~ membrane.prep, scales = 'free', space = 'free')
p + facet_grid(markers ~ replicate)
## Fold-changes
dd <- dunkley2006
exprs(dd) <- exprs(dd) - 0.25
image(dd)
image(dd, low = "green", high = "red")
## Feature names are displayed by default for smaller data
dunkley2006 <- dunkley2006[1:25, ]
image(dunkley2006)
image(dunkley2006, legend = "hello")
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(MSnbase)
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: mzR
Loading required package: Rcpp
Loading required package: BiocParallel
Loading required package: ProtGenerics
This is MSnbase version 1.20.7
Read '?MSnbase' and references therein for information
about the package and how to get started.
Attaching package: 'MSnbase'
The following object is masked from 'package:stats':
smooth
The following object is masked from 'package:base':
trimws
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/MSnbase/MSnSet-class.Rd_%03d_medium.png", width=480, height=480)
> ### Name: MSnSet-class
> ### Title: The "MSnSet" Class for MS Proteomics Expression Data and
> ### Meta-Data
> ### Aliases: MSnSet-class class:MSnSet MSnSet exprs,MSnSet-method
> ### dim,MSnSet-method fileNames,MSnSet-method msInfo,MSnSet-method
> ### processingData,MSnSet-method qual,MSnSet-method qual
> ### show,MSnSet-method purityCorrect,MSnSet-method
> ### purityCorrect,MSnSet,matrix-method meanSdPlot,MSnSet-method t.MSnSet
> ### [,MSnSet-method [,MSnSet,ANY,ANY-method [,MSnSet,ANY,ANY,ANY-method
> ### as.ExpressionSet.MSnSet as.MSnSet.ExpressionSet as.data.frame.MSnSet
> ### ms2df coerce,IBSpectra,MSnSet-method
> ### coerce,MSnSet,ExpressionSet-method coerce,ExpressionSet,MSnSet-method
> ### coerce,MSnSet,data.frame-method write.exprs write.exprs,MSnSet-method
> ### experimentData<-,MSnSet,MIAPE-method combine,MSnSet,MSnSet-method
> ### topN,MSnSet,MSnSet-method topN,MSnSet-method topN,matrix-method topN
> ### filterNA,MSnSet-method filterNA,matrix-method filterNA
> ### filterZero,MSnSet-method filterZero,matrix-method filterZero
> ### log,MSnSet-method image,MSnSet-method image2 MAplot,MSnSet-method
> ### addIdentificationData,MSnSet,character-method
> ### addIdentificationData,MSnSet,mzIDClasses-method
> ### addIdentificationData,MSnSet,mzID-method
> ### addIdentificationData,MSnSet,mzIDCollection-method
> ### addIdentificationData,MSnSet,data.frame-method
> ### removeNoId,MSnSet-method removeMultipleAssignment-method
> ### removeMultipleAssignment,MSnSet-method removeMultipleAssignment
> ### idSummary,MSnSet-method idSummary trimws trimws,MSnSet-method
> ### trimws,data.frame-method exptitle,MSnSet-method
> ### expemail,MSnSet-method ionSource,MSnSet-method analyser,MSnSet-method
> ### analyzer,MSnSet-method detectorType,MSnSet-method
> ### description,MSnSet-method updateFvarLabels updateSampleNames
> ### updateFeatureNames droplevels.MSnSet
> ### Keywords: classes
>
> ### ** Examples
>
> data(msnset)
> msnset <- msnset[10:15]
>
> exprs(msnset)[1, c(1, 4)] <- NA
> exprs(msnset)[2, c(1, 2)] <- NA
> is.na(msnset)
iTRAQ4.114 iTRAQ4.115 iTRAQ4.116 iTRAQ4.117
X18 TRUE FALSE FALSE TRUE
X19 TRUE TRUE FALSE FALSE
X2 FALSE FALSE FALSE FALSE
X20 FALSE FALSE FALSE FALSE
X21 FALSE FALSE FALSE FALSE
X22 FALSE FALSE FALSE FALSE
> featureNames(filterNA(msnset, pNA = 1/4))
[1] "X2" "X20" "X21" "X22"
> featureNames(filterNA(msnset, pattern = "0110"))
[1] "X18" "X2" "X20" "X21" "X22"
>
> M <- matrix(rnorm(12), 4)
> pd <- data.frame(otherpdata = letters[1:3])
> fd <- data.frame(otherfdata = letters[1:4])
> x0 <- MSnSet(M, fd, pd)
> sampleNames(x0)
[1] "1" "2" "3"
>
> M <- matrix(rnorm(12), 4)
> colnames(M) <- LETTERS[1:3]
> rownames(M) <- paste0("id", LETTERS[1:4])
> pd <- data.frame(otherpdata = letters[1:3])
> rownames(pd) <- colnames(M)
> fd <- data.frame(otherfdata = letters[1:4])
> rownames(fd) <- rownames(M)
> x <- MSnSet(M, fd, pd)
> sampleNames(x)
[1] "A" "B" "C"
>
>
> ## Visualisation
>
> library(pRolocdata)
This is pRolocdata version 1.10.0.
Use 'pRolocdata()' to list available data sets.
> data(dunkley2006)
> image(dunkley2006)
> ## Changing colours
> image(dunkley2006, high = "darkgreen")
> image(dunkley2006, high = "darkgreen", low = "yellow")
> ## Forcing feature names
> image(dunkley2006, fnames = TRUE)
> ## Facetting
> image(dunkley2006, facetBy = "replicate")
> p <- image(dunkley2006)
> library("ggplot2") ## for facet_grid
> p + facet_grid(replicate ~ membrane.prep, scales = 'free', space = 'free')
> p + facet_grid(markers ~ replicate)
> ## Fold-changes
> dd <- dunkley2006
> exprs(dd) <- exprs(dd) - 0.25
> image(dd)
> image(dd, low = "green", high = "red")
> ## Feature names are displayed by default for smaller data
> dunkley2006 <- dunkley2006[1:25, ]
> image(dunkley2006)
> image(dunkley2006, legend = "hello")
>
>
>
>
>
> dev.off()
null device
1
>