R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Pairwise Distance Computation for Protein Information Sets

clustDist

R Documentation

Pairwise Distance Computation for Protein Information Sets

Description

This function computes the mean (normalised) pairwise distances for pre-defined sets of proteins.

Usage

clustDist(object, k = 1:5, fcol = "GOAnnotations", n = 5,
  verbose = TRUE, seed)

Arguments

`object`	An instance of class `"MSnSet"`.
`k`	The number of clusters to try fitting to the protein set. Default is `k = 1:5`.
`fcol`	The feature meta-data containing matrix of protein sets/ marker definitions. Default is `GOAnnotations`.
`n`	The minimum number of proteins per set. If protein sets contain less than `n` instances they will be ignored. Defualt is 5.
`verbose`	A logical defining whether a progress bar is displayed.
`seed`	An optional seed for the random number generator.

Details

The input to the function is a MSnSet dataset containing a matrix appended to the feature data slot identifying the membership of protein instances to a pre-defined set(s) e.g. a specific Gene Ontology term etc.

For each protein set, the clustDist function (i) extracts all instances belonging to the set, (ii) using the kmeans algorithm fits and tests k = c(1:5) (default) cluster components to each set, (iii) calculates the mean pairwise distance for each k tested.

Note: currently distances are calcualted in Euclidean space, but other distance metrics will be supported in the future).

The output is a list of ClustDist objects, one per information cluster. The ClustDist class summarises the algorithm information such as the number of k's tested for the kmeans, and mean and normalised pairwise Euclidean distances per numer of component clusters tested. See ?ClustDist for more details.

Value

An instance of "ClustDistList" containing a "ClustDist" instance for every protein set, which summarises the algorithm information such as the number of k's tested for the kmeans, and mean and normalised pairwise Euclidean distances per numer of component clusters tested.

Author(s)

Lisa Breckels

Examples

library(pRolocdata)
data(dunkley2006)
par <- setAnnotationParams(inputs =
                   c("Arabidopsis thaliana genes",
                   "TAIR locus ID"))
## add protein sets/annotation information
xx <- addGoAnnotations(dunkley2006, par)
## filter
xx <- filterMinMarkers(xx, n = 50)
xx <- filterMaxMarkers(xx, p = .25)
## get distances for protein sets 
dd <- clustDist(xx)
## plot clusters for first 'ClustDist' object 
## in the 'ClustDistList'
plot(dd[[1]], xx)
## plot distances for all protein sets 
plot(dd)
## Extract normalised distances
## Normalise by n^1/3
minDist <- getNormDist(dd, p = 1/3)
## Get new order according to lowest distance
o <- order(minDist)
## Re-order GOAnnotations 
fData(xx)$GOAnnotations <- fData(xx)$GOAnnotations[, o]
if (interactive()) {
pRolocVis(xx, fcol = "GOAnnotations")
}

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(pRoloc)
Loading required package: MSnbase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: mzR
Loading required package: Rcpp
Loading required package: BiocParallel
Loading required package: ProtGenerics

This is MSnbase version 1.20.7 
  Read '?MSnbase' and references therein for information
  about the package and how to get started.


Attaching package: 'MSnbase'

The following object is masked from 'package:stats':

    smooth

The following object is masked from 'package:base':

    trimws

Loading required package: MLInterfaces
Loading required package: annotate
Loading required package: AnnotationDbi
Loading required package: stats4
Loading required package: IRanges
Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: XML
Loading required package: cluster

This is pRoloc version 1.12.4 
  Read '?pRoloc' and references therein for information
  about the package and how to get started.

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/pRoloc/clustDist.Rd_%03d_medium.png", width=480, height=480)
> ### Name: clustDist
> ### Title: Pairwise Distance Computation for Protein Information Sets
> ### Aliases: clustDist
> 
> ### ** Examples
> 
> library(pRolocdata)

This is pRolocdata version 1.10.0.
Use 'pRolocdata()' to list available data sets.
> data(dunkley2006)
> par <- setAnnotationParams(inputs =
+                    c("Arabidopsis thaliana genes",
+                    "TAIR locus ID"))
Using species Arabidopsis thaliana genes (TAIR10 (2010-09-TAIR10))
Using feature type TAIR locus ID(s)
Connecting to Biomart...
> ## add protein sets/annotation information
> xx <- addGoAnnotations(dunkley2006, par)
Loading required namespace: GO.db

Loading required package: GO.db
> ## filter
> xx <- filterMinMarkers(xx, n = 50)
Retaining 16 out of 153 in GOAnnotations
> xx <- filterMaxMarkers(xx, p = .25)
Retaining 11 out of 16 in GOAnnotations
> ## get distances for protein sets 
> dd <- clustDist(xx)
   |                                                                               |                                                                      |   0%   |                                                                               |======                                                                |   9%   |                                                                               |=============                                                         |  18%   |                                                                               |===================                                                   |  27%   |                                                                               |=========================                                             |  36%   |                                                                               |================================                                      |  45%   |                                                                               |======================================                                |  55%   |                                                                               |=============================================                         |  64%   |                                                                               |===================================================                   |  73%   |                                                                               |=========================================================             |  82%   |                                                                               |================================================================      |  91%   |                                                                               |======================================================================| 100%
> ## plot clusters for first 'ClustDist' object 
> ## in the 'ClustDistList'
> plot(dd[[1]], xx)
> ## plot distances for all protein sets 
> plot(dd)
> ## Extract normalised distances
> ## Normalise by n^1/3
> minDist <- getNormDist(dd, p = 1/3)
> ## Get new order according to lowest distance
> o <- order(minDist)
> ## Re-order GOAnnotations 
> fData(xx)$GOAnnotations <- fData(xx)$GOAnnotations[, o]
> #if (interactive()) {
> pRolocVis(xx, fcol = "GOAnnotations")
Error: could not find function "pRolocVis"
Execution halted