A reference class to store, manage and process Synapt G2 data
to combine identification and quantitation results.
The data, intermediate and final results are stored together
in such a ad-how container called a class. In the frame of the
analysis of a set of 3 data files, namely as identification peptide,
a quantitation peptide and a quantitation Pep3D, such a container is created
and populated, updated according to the user's instructions
and used to display and export results.
The functionality of the synapter package implemented in the
Synapter class in described in the Details section
below. Documentation for the individual methods is provided in the
Methods section. Finally, a complete example of an analysis is
provided in the Examples section, at the end of this document.
See also papers by Shliaha et al. for details about ion mobility
separation and the manuscript describing the synapter
methodology.
Usage
Synapter(filenames, master) ## creates an instance of class 'Synapter'
Arguments
filenames
A named list of file names to be load. The
names must be 'identpeptide', 'quantpeptide', 'quantpep3d' and 'fasta'.
If missing, dialog boxes pop up to select the four files
manually.
identpeptide can be a csv final peptide file (from
PLGS) or a saved "MasterPeptides" data object as
created by makeMaster if working with master
peptide data. To serialise the "MasterPeptides"
instance, use the saveRDS function, and file extenstion
rds.
master
A logical that defines if the identification file
is a master file. See makeMaster for details
about this strategy.
Details
A Synapter object logs every operation that is applied to
it. When displayed with show or when the name of the instance
is typed at the R console, the original input file names, all
operations and resulting the size of the respective data are
displayed. This allows the user to trace the effect of respective
operations.
Loading the data
The construction of the data and analysis container, technically
defined as an instance or object of class Synapter, is created
with the Synapter constructor. This function opens four dialog
boxes for the user to point to the input files,
namely (and in that order), the identification final peptide csv file, the
quantitation final peptide csv file and the quantitation Pep3D csv file (as exported
from the PLGS software) and the fasta file use for peptide
identification.
The files are read and the data is stored in the newly
created Synapter instance. The file names can also be
specified as a named list with names 'identpeptide', 'quantpeptide'
and 'quantpep3d' respectively.
The final peptide files are filtered
to retain peptides with matchType corresponding to
PepFrag1 and PepFrag2, corresponding to unmodified
round 1 and 2 peptide identification. Other types, like
NeutralLoss_NH3, NeutralLoss_H20, InSource,
MissedCleavage or VarMod are not considered in the rest
of the analysis. The quantitation Pep3D data is filtered to retain
Function equal to 1 and unique quantitation spectrum ids,
i.e. unique entries for multiple charge states or isotopes of an EMRT
(exact mass-retention time features).
Then, p-values for Regular peptides are computed based on
the Regular and Random database types score
distributions, as described in Käll et al.,
2008a. Only unique peptide sequences are taken into account:
in case of duplicated peptides, only one entry is kept.
Empirical p-values are adjusted using Bonferroni
and Benjamini and Hochberg, 1995 (multtest package)
and q-values are computed using the qvalue package
(Storey JD and Tibshirani R., 2003 and Käll et
al., 2008b). Only Regular entries are stored in the
resulting data for subsequent analysis.
The data tables can be exported as csv spreadsheets with the
writeIdentPeptides and writeQuantPeptides methods.
Filtering identification and quantitation peptide
The first step of the analysis aims to match reliable peptide.
The final peptide datasets are
filtered based on the FDR (BH is default) using the
filterQuantPepScore and filterIdentPepScore
methods. Several plots are provided to illustrate peptide score
densities (from which p-values are estimated, plotPepScores;
use getPepNumbers to see how many peptides were available) and
q-values (plotFdr).
Peptides matching to multiple proteins in the fasta file (non-unique
tryptic identification and quantitation peptides) can be
discarded with the filterUniqueDbPeptides method. One can
also filter on the peptide length using filterPeptideLength.
Another filtering criterion is mass accuracy. Error tolerance
quantiles (in ppm, parts per million) can be visualised with the
plotPpmError method. The values can be retrieved with
getPpmErrorQs. Filtering is then done separately for
identification and quantitation peptide data using
filterIdentPpmError and filterQuantPpmError
respectively. The previous plotting functions can be used again to
visualise the resulting distribution.
Filtering can also be performed at the level of protein false
positive rate, as computed by the PLGS application
(protein.falsePositiveRate column), which counts the
percentage of decoy proteins that have been identified prior to the
regular protein of interest. This can be done with the
filterIdentProtFpr and filterQuantProtFpr methods.
Note that this field is erroneously called a false positive rate in
the PLGS software and the associated manuscript; it is a false
discovery rate.
Merging identification and quantitation peptides
Common and reliable identification and quantitation peptides are
then matched based on their sequences and merged using the
mergePeptides method.
Retention time modelling
Systematic differences between identification features and
quantitation features retention times are modelled by
fitting a local regression (see the loess function for
details), using the modelRt method. The smoothing parameter,
or number of neighbour data points used the for local fit, is
controlled by the span parameter that can be set in the above
method.
The effect of this parameter can be observed with the plotRt
method, specifying what = "data" as parameters. The resulting
model can then be visualised with the above method specifying
what = "model", specifying up to 3 number of standard
deviations to plot. A histogram of retention time differences can
be produced with the plotRtDiffs method.
Mention plotFeatures here.
Grid search to optimise matching tolerances
Matching of identification peptides and quantitation EMRTs is done
within a mass tolerance in parts per million (ppm) and the modelled
retention time +/- a certain number of standard deviations.
To help in the choice of these two parameters, a grid search over a
set of possible values is performed and performance metrics are
recorded, to guide in the selection of a 'best' pair of parameters.
The following metrics are computed:
(1) the percentage of identification
peptides that matched a single quantitation EMRT (called prcntTotal),
(2) the percentage of identification peptides used in the retention time
model that matched the quantitation EMRT corresponding to the
correct quantitation peptide in ident/quant pair of the model
(called prcntModel)
and
(3) the detailed about the matching of the features used for
modelling (accessible with getGridDetails) and the
corresponding details grid that reports the percentage of
correct unique assignments.
The detailed grid results specify the number of non
matched identification peptides (0), the number of correctly (1) or
wrongly (-1) uniquely matched identification peptides, the number of
identification peptides that matched 2 or more peptides including
(2+) or excluding (2-) the correct quantitation equivalent are also
available.
See the next section for additional details about how matching.
The search is performed with the searchGrid method, possibly
on a subset of the data (see Methods and Examples sections for
further details).
The parameters used for matching can be set manually with
setPpmError and setRtNsd respectively, or using
setBestGridParams to apply best parameters as defined using
the grid search. See example and method documentation for details.
Identification transfer: matching identification peptides and quantitation EMRTs
The identification peptide - quantitation EMRT matching, termed
identification transfer, is performed using the best parameters, as
defined above with a grid search, or using user-defined parameters.
Matching is considered successful when one and only one EMRT is
found in the mass tolerance/retention time window defined by the
error ppm and number of retention time standard deviations
parameters. The values of uniquely matched EMRTs are reported in the
final matched dataframe that can be exported (see below). If
however, none or more than one EMRTs are matched, 0 or the number of
matches are reported.
As identification peptides are serially individually matched to 'close'
EMRTs, it is possible for peptides to be matched the same EMRT
independently. Such cases are reported as -1 in the results
dataframes.
The results can be assess using the plotEMRTtable (or
getEMRTtable to retrieve the values) and performace
methods. The former shows the number of identification peptides assigned to
none (0), exactly 1 (1) or more (> 2) EMRTs.
The latter method reports matched identification peptides, the number of
(q-value and protein FPR filtered) identification and quantitation peptides.
Matched EMRT and quantitation peptide numbers are then compared
calculating the synapter enrichment (100 * ( synapter - quant ) / quant)
and Venn counts.
Exporting and saving data
The merged identification and quantitation peptides can be exported
to csv using the writeMergedPeptides method. Similarly, the
matched identification peptides and quantitation EMRTs are exported
with writeMatchedEMRTs.
Complete Synapter instances can be serialised with
save, as any R object, and reloaded with load for
further analysis.
Methods
Analysis methods
mergePeptides
signature(object = "Synapter"): Merges
quantitation and identification final peptide data, used to
perform retention time modelling (see modelRt below).
modelRt
signature(object = "Synapter",
span = "numeric"): Performs local polynomial regression
fitting (see loess) retention time alignment using
span parameter value to control the degree of smoothing.
findEMRTs
signature(object = "Synapter", ppm =
"numeric", nsd = "numeric", mergedEMRTs = c("rescue", "copy",
"transfer")):
Finds EMRTs matching identification peptides using ppm
mass tolerance and nsd number of retention time standard
deviations. The last two parameters are optional if previously
set with setPpmError and setRtNsd or, better,
setBestGridParams (see below). The mergedEMRTs
parameter defined the behaviour for those high confidence
features that where identified in both identification and
quantitation acquisitions and used for the retention time model
(see mergePeptides). Prior to version 1.1.1, these
features were transferred from the quantitation pep3d file if
unique matches were found, as any feature ("transfer").
As a result, those matching 0 or > 1 EMRTs were
quantified as NA. The default is now to "rescue"
the quantitation values of these by directly retrieving the data
from the quantification peptide data. Alternatively, the
quantitation values for these features can be directly taken
from the quantitation peptide data using "copy", thus
effectively bypassing identification transfer.
searchGrid
signature(object="Synapter",
ppms="character", nsds="character", subset="numeric", n =
"numeric", verbose="logical"): Performs a grid search. The
grid is defined by the ppm and nsd numerical
vectors, representing the sequence of values to be
tested. Default are seq(5, 20, 2) and seq(0.5, 5,
0.5) respectively. subset and n allow to use a
randomly chosen subset of the data for the grid search to
reduce search time. subset is a numeric, between 0 and 1,
describing the percentage of data to be used; n specifies
the absolute number of feature to use. The default is to use all
data. verbose controls whether textual output should be
printed to the console. (Note, the mergedEMRTs value used
in internal calls to findEMRTs is "transfer" - see
findEMRTs for details).
Methods to display, access and set data
show
signature(object = "Synapter"): Display
object by printing a summary to the console.
dim
signature(x="Synapter"): Returns a list
of dimensions for the identification peptide, quantitation
peptide, merged peptides and matched features data sets.
inputFiles
signature(object="Synapter"): Returns a
character of length 4 with the names of the input files
used as identpeptide, quantpeptide,
quantpep3d and fasta.
getLog
signature(object="Synapter"): Returns a
character of variable length with a summary of processing
undergone by object.
getGrid
signature(object="Synapter", digits =
"numeric"): Returns a named list of length 3 with the
precent of total (prcntTotal), percent of model
(prcntModel) and detailed (details) grid search
results. The details grid search reports the proportion
of correctly assigned features (+1) to all unique assignments
(+1 and -1). Values are rounded to 3 digits by default.
getGridDetails
signature(object="Synapter"): Returns
a list of number of ..., -2, -1, 0, +1, +2, ... results
found for each of the ppm/nsd pairs tested during
the grid search.
getBestGridValue
signature(object="Synapter"):
Returns a named numeric of length 3 with best grid values
for the 3 searches. Names are prcntTotal,
prcntModel and details.
getBestGridParams
signature(object="Synapter"):
Returns a named list of matrices (prcntTotal,
prcntModel and details). Each matrix gives the
ppm and nsd pairs that yielded the best grid
values (see getBestGridValue above).
setBestGridParams
signature(object="Synapter",
what="character"): This methods set the best parameter pair,
as determined by what. Possible values are auto
(default), model, total and details. The 3
last ones use the (first) best parameter values as reported by
getBestGridParams. auto uses the best model
parameters and, if several best pairs exists, the one that
maximises details is selected.
setPepScoreFdr
signature(object="Synapter", fdr =
"numeric"): Sets the peptide score false discovery rate
(default is 0.01) threshold used by filterQuantPepScore
and filterIdentPepScore.
getPepScoreFdr
signature(object="Synapter"): Returns
the peptide false discrovery rate threshold.
setIdentPpmError
signature(object="Synapter", ppm =
"numeric"): Set the identification mass tolerance to
ppm (default 10).
getIdentPpmError
signature(object="Synapter"):
Returns the identification mass tolerance.
setQuantPpmError
signature(object="Synapter", ppm =
"numeric"): Set the quantitation mass tolerance to ppm
(default 10).
getQuantPpmError
signature(object="Synapter"):
Returns the quantitation mass tolerance.
setPpmError
signature(object="Synapter", ppm =
"numeric"): Sets the identification and quantitation mass
tolerance ppm (default is 10).
setLowessSpan
signature(object="Synapter", span =
"numeric"): Sets the loessspan parameter;
default is 0.05.
getLowessSpan
signature(object="Synapter"): Returns
the span parameter value.
setRtNsd
signature(object="Synapter", nsd =
"numeric"): Sets the retention time tolerance nsd,
default is 2.
getRtNsd
signature(object="Synapter"): Returns the
value of the retention time tolerance nsd.
getPpmErrorQs
signature(object="Synapter", qs =
"numeric", digits = "numeric"): Returns the mass tolerance
qs quantiles (default is c(0.25, 0.5, 0.75, seq(0.9,
1, 0.01)) for the identification and quantitation
peptides. Default is 3 digits.
getRtQs
signature(object="Synapter", qs =
"numeric", digits = "numeric"): Returns the retention time
tolerance qs quantiles (default is c(0.25, 0.5,
0.75, seq(0.9, 1, 0.01)) for the identification and
quantitation peptides. Default is 3 digits.
getPepNumbers
signature(object="Synapter"): Returns
the number of regular and random quantitation and identification
peptide considered for p-value calculation and used to plot the
score densities (see plotPepScores). Especially the
difference between random and regular entries are informative in
respect with the confidence of the random scores distribution.
showFdrStats
signature(object="Synapter", k =
"numeric"): Returns a named list of length
2 with the proportion of identification and quantitation
peptides that are considered significant with a threshold of
k (default is c(0.001, 0.01, 0.5, 0.1)) using raw
and adjusted p-values/q-values.
getEMRTtable
signature(object="Synapter"): Returns a
table with the number of 0, 1, 2, ... assigned EMRTs.
performance
signatute(object="Synapter", verbose =
TRUE): Returns (and displays, if verbose) the
performance of the synapter analysis.
performance2
signatute(object="Synapter", verbose =
TRUE): Returns (and displays, if verbose) information
about number of missing values and identification source of
transfered EMRTs.
Filters
filterUniqueDbPeptides
signature(object="Synapter"):
This method first digests the fasta database file and keeps
unique tryptic peptides. (NOTE: since version 1.5.3, the tryptic
digestion uses the cleaver package, replacing the more
simplistic inbuild function. The effect of this change is
documented in https://github.com/lgatto/synapter/pull/47). The
number of missed cleavages can be set as missedCleavages
(default is 0). Those peptide sequences are then used as a
filter against the identification and quantitation peptides,
where only unique proteotyptic instances (no miscleavage allowed
by default) are eventually kept in the object instance.
This method also removes any additional duplicated peptides,
that would not match any peptides identified in the fasta
database.
filterUniqueQuantDbPeptides
signature(object="Synapter",
missedCleavages = 0, verbose = TRUE): As
filterUniqueDbPeptides for quantitation peptides only.
filterUniqueIdentDbPeptides
signature(object="Synapter",
missedCleavages = 0, verbose = TRUE): As
filterUniqueDbPeptides for identification peptides
only.
filterQuantPepScore
signature(object="Synapter", fdr
= "numeric", method = "character"): Filters the quantitation
peptides using fdr false discovery rate. fdr is
missing by default and is retrieved with getPepScoreFdr
automatically. If not set, default value of 0.01 is
used. method defines how to performe p-value adjustment;
one of BH, Bonferrone or qval. See details
section for more information.
filterIdentPepScore
signature(object="Synapter", fdr
= "numeric", method = "charactet"): As
filterQuantPepScore, but for identification peptides.
filterQuantProtFpr
signature(object="Synapter", fpr
= "numeric"): Filters quantitation peptides using the protein
false positive rate (erroneously defined as a FPR, should be
FDR), as reported by PLGS, using threshold set by fpr
(missing by default) or retrieved by getProtFpr.
filterIdentProtFpr
signature(object="Synapter", fpr =
"numeric"): as filterQuantProtFpr, but for
identification peptides.
filterQuantPpmError
signature(object="Synapter", ppm
= "numeric"): Filters the quantitation peptides based on the
mass tolerance ppm (default missing) provided or
retrieved automatically using getPpmError.
filterIdentPpmError
signature(object="Synapter"): as
filterQuantPpmError, but for identification peptides.
Plotting
plotPpmError
signature(object="Synapter", what =
"character"): Plots the proportion of data against the mass
error tolerance in ppms. Depending on what, the data
for identification (what = "Ident"), quantitation
(what = "Quant") or "both" is plotted.
plotRtDiffs
signature(object="Synapter", ...): Plots
a histogram of retention time differences after
alignments. ... is passed to hist.
plotRt
signature(object="Synapter", what =
"character", f = "numeric", nsd = "numeric"): Plots the
Identification - Quantitation retention time difference as a
function of the Identification retention time. If what is
"data", two plots are generated: one ranging the full range
of retention time differences and one focusing on the highest data
point density and showing models with various span
parameter values, as defined by f (default is 2/3, 1/2,
1/4, 1/10, 1/16, 1/25, 1/50, passed as a numed numeric). If
what is "model", a focused plot with the applied
span parameter is plotted and areas of nsd (default is
x(1, 3, 5) number of standard deviations are shaded around
the model.
plotPepScores
signature(object="Synapter"): Plots the
distribution of random and regular peptide scores for
identification and quantitation features. This reflects how
peptide p-values are computed. See also getPepNumbers.
plotFdr
signature(object="Synapter", method =
"character"): Displays 2 plots per identification and
quantitation peptides, showing the number of significant
peptides as a function of the FDR cut-off and the expected false
number of false positive as a number of significant
tests. PepFrag 1 and 2 peptides are illustrated on the same
figures. These figures are adapted from plot.qvalue.
method, one of "BH", "Bonferroni" or
"qval", defines what identification statistics to use.
plotEMRTtable
signature(object="Synapter"): Plots
the barchart of number or 0, 1, 2, ... assigned EMRTs (see
getEMRTtable) .
plotGrid
signature(object="Synapter", what =
"character"): Plots a heatmap of the respective grid search
results. This grid to be plotted is controlled by what:
"total", "model" or "details" are
available.
plotFeatures
signature(object="Synapter", what =
"character", xlim = "numeric", ylim = "numeric"): Plots the
retention time against precursor mass space. If what is
"all", three such plots are created side by side: for the
identification peptides, the quantitation peptides and the
quantitation Pep3D data. If what is "some", a
subset of the rt/mass space can be defined with xlim
(default is c(40, 60)) and ylim (default is
c(1160, 1165)) and identification peptide, quantitation
peptides and EMRTs are presented on the same graph as grey
dots, blue dots and red crosses respectively. In addition,
rectangles based on the ppm and nsd defined tolerances (see
setPpmError and setNsdError) are drawn and
centered at the expected modelled retention time. This last
figure allows to visualise the EMRT matching.
Exporters
writeMergedPeptides
signature(object="Synapter", file
= "character", what = "character", ...): Exports the merged
peptide data to a comma-separated file (default name is
"Res-MergedPeptides.csv"). what can be
"light" (default) or "full" and specifies if the
full data or only selected columns are exported. ... are
passed to write.csv.
writeMatchedEMRTs
signature(object="Synapter", file =
"character", what = "character", ...): As above, saving the
matched EMRT table.
writeIdentPeptides
signature(object="Synapter", file
= "character", ...): As above, exporting the identification
peptide data.
writeQuantPeptides
signature(object="Synapter", file
= "character", ...): A above, exporting the quantitation
peptide data.
Other
as(, "MSnSet")
signature(x = "Synapter"): Coerce
object from Synapter to MSnSet class.
Käll L, Storey JD, MacCoss MJ, Noble WS
Posterior error probabilities and false discovery rates: two sides of
the same coin.
J Proteome Res. 2008a Jan; 7:(1)40-4
Bonferroni single-step adjusted p-values for strong control of the
FWER.
Benjamini Y. and Hochberg Y.
Controlling the false discovery rate: a practical and powerful
approach to multiple testing.
J. R. Statist. Soc. B., 1995, Vol. 57: 289-300.
Storey JD and Tibshirani R.
Statistical significance for genome-wide experiments.
Proceedings of the National Academy of Sciences, 2003, 100: 9440-9445.
Käll, Storey JD, MacCoss MJ, Noble WS
Assigning significance to peptides identified by tandem mass
spectrometry using decoy databases.
J Proteome Res. 2008b Jan; 7:(1)29-34
Improving qualitative and quantitative performance for MSE-based label
free proteomics, N.J. Bond, P.V. Shliaha, K.S. Lilley and L. Gatto,
Journal of Proteome Research, 2013, in press.
The Effects of Travelling Wave Ion Mobility Separation on Data
Independent Acquisition in Proteomics Studies, P.V. Shliaha, N.J. Bond,
L. Gatto and K.S. Lilley, Journal of Proteome Research, 2013, in press.
Examples
library(synapter) ## always needed
## Not run:
## (1) Construction - to create your own data objects
synapterTiny <- Synapter()
## End(Not run)
## let's use synapterTiny, shipped with the package
synapterTinyData() ## loads/prepares the data
synapterTiny ## show object
## (2) Filtering
## (2.1) Peptide scores and FDR
## visualise/explore peptide id scores
plotPepScores(synapterTiny)
getPepNumbers(synapterTiny)
## filter data
filterUniqueDbPeptides(synapterTiny) ## keeps unique proteotypic peptides
filterPeptideLength(synapterTiny, l = 7) ## default length is 7
## visualise before FDR filtering
plotFdr(synapterTiny)
setPepScoreFdr(synapterTiny, fdr = 0.01) ## optional
filterQuantPepScore(synapterTiny, fdr = 0.01) ## specifying FDR
filterIdentPepScore(synapterTiny) ## FDR not specified, using previously set value
## (2.2) Mass tolerance
getPpmErrorQs(synapterTiny)
plotPpmError(synapterTiny, what="Ident")
plotPpmError(synapterTiny, what="Quant")
setIdentPpmError(synapterTiny, ppm = 20) ## optional
filterQuantPpmError(synapterTiny, ppm = 20)
## setQuantPpmError(synapterTiny, ppm = 20) ## set quant ppm threshold below
filterIdentPpmError(synapterTiny, ppm=20)
filterIdentProtFpr(synapterTiny, fpr = 0.01)
filterQuantProtFpr(synapterTiny, fpr = 0.01)
getPpmErrorQs(synapterTiny) ## to be compared with previous output
## (3) Merge peptide sequences
mergePeptides(synapterTiny)
## (4) Retention time modelling
plotRt(synapterTiny, what="data")
setLowessSpan(synapterTiny, 0.05)
modelRt(synapterTiny) ## the actual modelling
getRtQs(synapterTiny)
plotRtDiffs(synapterTiny)
## plotRtDiffs(synapterTiny, xlim=c(-1, 1), breaks=500) ## pass parameters to hist()
plotRt(synapterTiny, what="model") ## using default nsd 1, 3, 5
plotRt(synapterTiny, what="model", nsd=0.5) ## better focus on model
plotFeatures(synapterTiny, what="all")
setRtNsd(synapterTiny, 3) ## RtNsd and PpmError are used for detailed plot
setPpmError(synapterTiny, 10) ## if not set manually, default values are set automatically
plotFeatures(synapterTiny, what="some", xlim=c(36,44), ylim=c(1161.4, 1161.7))
## best plotting to svg for zooming
set.seed(1) ## only for reproducibility of this example
## (5) Grid search to optimise EMRT matching parameters
searchGrid(synapterTiny,
ppms = 7:10, ## default values are 5, 7, ..., 20
nsds = 1:3, ## default values are 0.5, 1, ..., 5
subset = 0.2) ## default is 1
## alternatively, use 'n = 1000' to use exactly
## 1000 randomly selected features for the grid search
getGrid(synapterTiny) ## print the grid
getGridDetails(synapterTiny) ## grid details
plotGrid(synapterTiny, what = "total") ## plot the grid for total matching
plotGrid(synapterTiny, what = "model") ## plot the grid for matched modelled feature
plotGrid(synapterTiny, what = "details") ## plot the detail grid
getBestGridValue(synapterTiny) ## return best grid values
getBestGridParams(synapterTiny) ## return parameters corresponding to best values
setBestGridParams(synapterTiny, what = "auto") ## sets RtNsd and PpmError according the grid results
## 'what' could also be "model", "total" or "details"
## setPpmError(synapterTiny, 12) ## to manually set values
## setRtNsd(synapterTiny, 2.5)
## (6) Matching ident peptides and quant EMRTs
findEMRTs(synapterTiny)
plotEMRTtable(synapterTiny)
getEMRTtable(synapterTiny)
performance(synapterTiny)
performance2(synapterTiny)
## (7) Exporting data to csv spreadsheets
writeMergedPeptides(synapterTiny, what = "light") ## or what="full"
writeMergedPeptides(synapterTiny, file = "myresults.csv", what="light")
writeMatchedEMRTs(synapterTiny, what = "light") ## or what="full"
writeMatchedEMRTs(synapterTiny, file = "myresults2.csv", what="light")
## These will export the filter peptide data
writeIdentPeptides(synapterTiny, file = "myIdentPeptides.csv")
writeQuantPeptides(synapterTiny, file = "myQuantPeptides.csv")
## If used right after loading, the non-filted data will be exported
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(synapter)
Loading required package: MSnbase
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: mzR
Loading required package: Rcpp
Loading required package: BiocParallel
Loading required package: ProtGenerics
This is MSnbase version 1.20.7
Read '?MSnbase' and references therein for information
about the package and how to get started.
Attaching package: 'MSnbase'
The following object is masked from 'package:stats':
smooth
The following object is masked from 'package:base':
trimws
This is the 'synapter' version 1.14.2.
Read '?synapter' to get started.
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/synapter/Synapter.Rd_%03d_medium.png", width=480, height=480)
> ### Name: Synapter
> ### Title: Class "Synapter"
> ### Aliases: Synapter class:Synapter show,Synapter-method
> ### dim,Synapter-method inputFiles,Synapter-method inputFiles
> ### getLog,Synapter-method getLog mergePeptides,Synapter-method
> ### mergePeptides modelRt,Synapter-method modelRt
> ### findEMRTs,Synapter-method findEMRTs searchGrid,Synapter-method
> ### searchGrid getGrid,Synapter-method getGrid
> ### getGridDetails,Synapter-method getGridDetails
> ### getBestGridValue,Synapter-method getBestGridValue
> ### getBestGridParams,Synapter-method getBestGridParams
> ### setBestGridParams,Synapter-method setBestGridParams
> ### setPepScoreFdr,Synapter-method setPepScoreFdr
> ### getPepScoreFdr,Synapter-method getPepScoreFdr
> ### setIdentPpmError,Synapter-method setIdentPpmError
> ### getIdentPpmError,Synapter-method getIdentPpmError
> ### setQuantPpmError,Synapter-method setQuantPpmError
> ### getQuantPpmError,Synapter-method getQuantPpmError
> ### setPpmError,Synapter-method setPpmError setLowessSpan,Synapter-method
> ### setLowessSpan getLowessSpan,Synapter-method getLowessSpan
> ### setRtNsd,Synapter-method setRtNsd getRtNsd,Synapter-method getRtNsd
> ### getPpmErrorQs,Synapter-method getPpmErrorQs getRtQs,Synapter-method
> ### getRtQs getPepNumbers,Synapter-method getPepNumbers
> ### showFdrStats,Synapter-method showFdrStats setProtFpr,Synapter-method
> ### setProtFpr getProtFpr,Synapter-method getProtFpr
> ### getEMRTtable,Synapter-method getEMRTtable performance,Synapter-method
> ### performance2,Synapter-method performance performance2
> ### filterPeptideLength,Synapter-method filterPeptideLength
> ### filterUniqueDbPeptides,Synapter-method filterUniqueDbPeptides
> ### filterUniqueQuantDbPeptides,Synapter-method
> ### filterUniqueQuantDbPeptides
> ### filterUniqueIdentDbPeptides,Synapter-method
> ### filterUniqueIdentDbPeptides filterQuantPepScore,Synapter-method
> ### filterQuantPepScore filterIdentPepScore,Synapter-method
> ### filterIdentPepScore filterIdentPpmError,Synapter-method
> ### filterIdentPpmError filterQuantPpmError,Synapter-method
> ### filterQuantPpmError filterIdentProtFpr,Synapter-method
> ### filterIdentProtFpr filterQuantProtFpr,Synapter-method
> ### filterQuantProtFpr plotEMRTtable,Synapter-method plotEMRTtable
> ### plotFeatures,Synapter-method plotFeatures plotGrid,Synapter-method
> ### plotGrid plotPepScores,Synapter-method plotPepScores
> ### plotPpmError,Synapter-method plotPpmError plotFdr,Synapter-method
> ### plotFdr plotRt,Synapter-method plotRt plotRtDiffs,Synapter-method
> ### plotRtDiffs writeMatchedEMRTs,Synapter-method writeMatchedEMRTs
> ### writeMergedPeptides,Synapter-method writeMergedPeptides
> ### writeIdentPeptides,Synapter-method writeIdentPeptides
> ### writeQuantPeptides,Synapter-method writeQuantPeptides
> ### as.MSnSet.Synapter
> ### Keywords: classes
>
> ### ** Examples
>
> library(synapter) ## always needed
>
> ## Not run:
> ##D ## (1) Construction - to create your own data objects
> ##D synapterTiny <- Synapter()
> ## End(Not run)
>
> ## let's use synapterTiny, shipped with the package
> synapterTinyData() ## loads/prepares the data
> synapterTiny ## show object
Object of class "Synapter"
Package version 0.9.0
Data files:
+ Quantitation pep file: 02_MSe_tiny.csv
+ Identification pep file: 01_HDMSe_tiny.csv
+ Quantitation Pep3DAMRT file: 03_Pep3D_tiny.csv
+ Fasta file: 04_test_database.fasta
Log:
[1] "Instance created on Wed Jun 6 23:37:36 2012"
[2] "Read identification peptide data [5915,18]"
[ 8 lines ]
[11] "Added identification statistics to identification data"
[12] "Filtered identification Random entries [4979,22]"
>
> ## (2) Filtering
> ## (2.1) Peptide scores and FDR
>
> ## visualise/explore peptide id scores
> plotPepScores(synapterTiny)
> getPepNumbers(synapterTiny)
PepFrag1.Random PepFrag1.Regular PepFrag2.Random PepFrag2.Regular
ident 256 3522 448 1393
quant 496 2664 500 745
>
> ## filter data
> filterUniqueDbPeptides(synapterTiny) ## keeps unique proteotypic peptides
Fasta file: 897 proteins
21914 out of 22960 tryptic peptides are proteotypic.
> filterPeptideLength(synapterTiny, l = 7) ## default length is 7
>
> ## visualise before FDR filtering
> plotFdr(synapterTiny)
>
> setPepScoreFdr(synapterTiny, fdr = 0.01) ## optional
> filterQuantPepScore(synapterTiny, fdr = 0.01) ## specifying FDR
> filterIdentPepScore(synapterTiny) ## FDR not specified, using previously set value
>
> ## (2.2) Mass tolerance
> getPpmErrorQs(synapterTiny)
25% 50% 75% 90% 91% 92% 93% 94% 95% 96% 97%
Ident 0.982 2.183 4.070 6.750 7.663 8.299 9.136 9.989 12.060 13.997 16.082
Quant 0.432 0.972 1.682 3.083 3.341 3.606 3.992 5.021 5.733 6.455 7.736
98% 99% 100%
Ident 19.288 22.496 28.664
Quant 9.124 14.897 24.599
> plotPpmError(synapterTiny, what="Ident")
> plotPpmError(synapterTiny, what="Quant")
>
> setIdentPpmError(synapterTiny, ppm = 20) ## optional
> filterQuantPpmError(synapterTiny, ppm = 20)
> ## setQuantPpmError(synapterTiny, ppm = 20) ## set quant ppm threshold below
> filterIdentPpmError(synapterTiny, ppm=20)
>
> filterIdentProtFpr(synapterTiny, fpr = 0.01)
> filterQuantProtFpr(synapterTiny, fpr = 0.01)
>
> getPpmErrorQs(synapterTiny) ## to be compared with previous output
25% 50% 75% 90% 91% 92% 93% 94% 95% 96% 97%
Ident 0.968 2.144 3.939 6.233 6.589 7.035 7.942 8.775 9.523 10.688 12.643
Quant 0.431 0.964 1.661 2.973 3.137 3.386 3.702 4.214 5.211 6.095 7.117
98% 99% 100%
Ident 14.563 17.007 19.744
Quant 8.289 9.558 19.817
>
> ## (3) Merge peptide sequences
> mergePeptides(synapterTiny)
>
> ## (4) Retention time modelling
> plotRt(synapterTiny, what="data")
> setLowessSpan(synapterTiny, 0.05)
> modelRt(synapterTiny) ## the actual modelling
> getRtQs(synapterTiny)
25% 50% 75% 90% 91% 92% 93% 94% 95% 96% 97%
0.003 0.008 0.016 0.029 0.031 0.034 0.040 0.047 0.057 0.072 0.463
98% 99% 100%
4.161 10.703 32.140
> plotRtDiffs(synapterTiny)
> ## plotRtDiffs(synapterTiny, xlim=c(-1, 1), breaks=500) ## pass parameters to hist()
> plotRt(synapterTiny, what="model") ## using default nsd 1, 3, 5
> plotRt(synapterTiny, what="model", nsd=0.5) ## better focus on model
>
> plotFeatures(synapterTiny, what="all")
> setRtNsd(synapterTiny, 3) ## RtNsd and PpmError are used for detailed plot
> setPpmError(synapterTiny, 10) ## if not set manually, default values are set automatically
> plotFeatures(synapterTiny, what="some", xlim=c(36,44), ylim=c(1161.4, 1161.7))
> ## best plotting to svg for zooming
>
> set.seed(1) ## only for reproducibility of this example
>
> ## (5) Grid search to optimise EMRT matching parameters
> searchGrid(synapterTiny,
+ ppms = 7:10, ## default values are 5, 7, ..., 20
+ nsds = 1:3, ## default values are 0.5, 1, ..., 5
+ subset = 0.2) ## default is 1
| | | 0% | |====== | 8% | |============ | 17% | |================== | 25% | |======================= | 33% | |============================= | 42% | |=================================== | 50% | |========================================= | 58% | |=============================================== | 67% | |==================================================== | 75% | |========================================================== | 83% | |================================================================ | 92% | |======================================================================| 100%
> ## alternatively, use 'n = 1000' to use exactly
> ## 1000 randomly selected features for the grid search
> getGrid(synapterTiny) ## print the grid
$prcntTotal
7 8 9 10
1 0.747 0.755 0.760 0.760
2 0.824 0.833 0.839 0.839
3 0.824 0.833 0.839 0.839
$prcntModel
7 8 9 10
1 0.872 0.872 0.872 0.872
2 0.972 0.972 0.972 0.972
3 0.972 0.972 0.972 0.972
$details
7 8 9 10
1 0.984 0.984 0.984 0.984
2 0.986 0.986 0.986 0.986
3 0.986 0.986 0.986 0.986
> getGridDetails(synapterTiny) ## grid details
$`1:7`
-1 0 1
4 33 251
$`1:8`
-1 0 1
4 33 251
$`1:9`
-1 0 1
4 33 251
$`1:10`
-1 0 1
4 33 251
$`2:7`
-1 0 1
4 4 280
$`2:8`
-1 0 1
4 4 280
$`2:9`
-1 0 1
4 4 280
$`2:10`
-1 0 1
4 4 280
$`3:7`
-1 0 1
4 4 280
$`3:8`
-1 0 1
4 4 280
$`3:9`
-1 0 1
4 4 280
$`3:10`
-1 0 1
4 4 280
> plotGrid(synapterTiny, what = "total") ## plot the grid for total matching
> plotGrid(synapterTiny, what = "model") ## plot the grid for matched modelled feature
> plotGrid(synapterTiny, what = "details") ## plot the detail grid
> getBestGridValue(synapterTiny) ## return best grid values
prcntTotal prcntModel details
0.8390805 0.9722222 0.9859155
> getBestGridParams(synapterTiny) ## return parameters corresponding to best values
$prcntTotal
ppm nsd
[1,] 9 2
[2,] 9 3
[3,] 10 2
[4,] 10 3
$prcntModel
ppm nsd
[1,] 7 2
[2,] 7 3
[3,] 8 2
[4,] 8 3
[5,] 9 2
[6,] 9 3
[7,] 10 2
[8,] 10 3
$details
ppm nsd
[1,] 7 2
[2,] 7 3
[3,] 8 2
[4,] 8 3
[5,] 9 2
[6,] 9 3
[7,] 10 2
[8,] 10 3
> setBestGridParams(synapterTiny, what = "auto") ## sets RtNsd and PpmError according the grid results
> ## 'what' could also be "model", "total" or "details"
> ## setPpmError(synapterTiny, 12) ## to manually set values
> ## setRtNsd(synapterTiny, 2.5)
>
> ## (6) Matching ident peptides and quant EMRTs
> findEMRTs(synapterTiny)
> plotEMRTtable(synapterTiny)
> getEMRTtable(synapterTiny)
0 1
559 2486
> performance(synapterTiny)
(S) Synapter: 2486 EMRTs uniquely matched.
(I) Ident: 3045 peptides.
(Q) Quant: 1840 peptides.
Enrichment (S/Q): 35.11%
Overlap:
Q S QS
433 1077 1407
> performance2(synapterTiny)
na.counts
id.source FALSE TRUE
rescue 35 0
transfer 2486 524
>
> ## (7) Exporting data to csv spreadsheets
> writeMergedPeptides(synapterTiny, what = "light") ## or what="full"
> writeMergedPeptides(synapterTiny, file = "myresults.csv", what="light")
> writeMatchedEMRTs(synapterTiny, what = "light") ## or what="full"
> writeMatchedEMRTs(synapterTiny, file = "myresults2.csv", what="light")
> ## These will export the filter peptide data
> writeIdentPeptides(synapterTiny, file = "myIdentPeptides.csv")
> writeQuantPeptides(synapterTiny, file = "myQuantPeptides.csv")
> ## If used right after loading, the non-filted data will be exported
>
>
>
>
>
> dev.off()
null device
1
>