The shortest path analysis was proposed by Zhou et. al. The basic
computation is to find the shortest path in a supplied graph between
two Entrez Gene IDs. Zhou et al claim that other genes annotated along
that path are likely to have the same GO annotation as the two end
points.
Usage
shortestPath(g, GOnode, mapfun=NULL, chip=NULL)
Arguments
g
An instance of the graph class.
GOnode
A length one character vector specifying the GO node of
interest.
mapfun
A function taking a character vector of GO IDs as its
only argument and returning a list of character vectors of Enterz
Gene IDs annotated at each corresponding GO ID. The function should
behave similarly to mget(x, go2egmap, ifnotfound=NA), that
is, NA should be returned if a specified GO ID has no Entrez
ID mappings. See details for the interaction of mapfun and
chip.
chip
The name of a DB-based annotation data package (the name
will end in ".db"). This package will be used to generate an Entrez
ID to GO ID mapping instead of mapfun.
Details
The algorithm implemented here is quite simple. All Entrez Gene
identifiers that are annotated at the GO node of interest are
obtained. Those that are found as nodes in the graph are retained and
used for the computation. For every pair of nodes at the GO term the
shortest path between them is computed using sp.between from
the RBGL package.
There is a presumption that the graph is undirected. This
restriction could probably be lifted if there was some reason for it -
a patch would be gratefully accepted.
The mapping of GO node to Entrez ID is achieved in one of three ways:
If mapfun is provided, it will be used to perform the
needed lookups. In this case, chip will be ignored.
If chip is provided and mapfun=NULL, then the
needed lookups will be done based on the GO to Entrez mappings
encapsulated in the specified annotation data package. This is
the recommended usage.
If mapfun and chip are NULL or missing,
then the function will attempt to load the GO package (the
environment-based package, distinct from GO.db). This package
contains a legacy environment mapping GO IDs to Entrez IDs. If
the GO package is not available, an error will be raised.
Omitting both mapfun and chip is not recommended as
it is not compatible with the DB-based annotation data packages.
Value
The return values is a list with the following components:
shortestpaths
A list of the ouput from sp.between. The
names are the names of the nodes used as the two endpoints
nodesUsed
A vector of the Entrez Gene IDs that were both found
at the GO term of interest and were nodes in the supplied graph,
g. These were used to compute the shortest paths.
nodesNotUsed
A vector of Entrez Gene IDs that were annotated at
the GO term, but were not found in the graph g.
Author(s)
R. Gentleman
References
Transitive functional annotation by shortest-path analysis
of gene expression data, by X. Zhou and M-C J. Kao and W. H. Wong,
PNAS, 2002
See Also
sp.between
Examples
library("hgu95av2.db")
library("RBGL")
set.seed(321)
uniqun <- function(x) unique(unlist(x))
goid <- "GO:0005778"
egIds <- uniqun(mget(uniqun(hgu95av2GO2PROBE[[goid]]),
hgu95av2ENTREZID))
v1 <- randomGraph(egIds, 1:10, .3, weights=FALSE)
## Since v1 is random, it might be disconnected and we need a
## connected graph to guarantee the existence of a path.
c1 <- connComp(v1)
largestComp <- c1[[which.max(sapply(c1, length))]]
v2 <- subGraph(largestComp, v1)
a1 <- shortestPath(v2, goid, chip="hgu95av2.db")
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(GOstats)
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: Category
Loading required package: stats4
Loading required package: AnnotationDbi
Loading required package: IRanges
Loading required package: S4Vectors
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: Matrix
Attaching package: 'Matrix'
The following object is masked from 'package:S4Vectors':
expand
Loading required package: graph
Attaching package: 'GOstats'
The following object is masked from 'package:AnnotationDbi':
makeGOGraph
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/GOstats/shortestPath.Rd_%03d_medium.png", width=480, height=480)
> ### Name: shortestPath
> ### Title: Shortest Path Analysis
> ### Aliases: shortestPath
> ### Keywords: manip
>
> ### ** Examples
>
> library("hgu95av2.db")
Loading required package: org.Hs.eg.db
> library("RBGL")
>
> set.seed(321)
> uniqun <- function(x) unique(unlist(x))
>
> goid <- "GO:0005778"
> egIds <- uniqun(mget(uniqun(hgu95av2GO2PROBE[[goid]]),
+ hgu95av2ENTREZID))
>
> v1 <- randomGraph(egIds, 1:10, .3, weights=FALSE)
> ## Since v1 is random, it might be disconnected and we need a
> ## connected graph to guarantee the existence of a path.
> c1 <- connComp(v1)
> largestComp <- c1[[which.max(sapply(c1, length))]]
> v2 <- subGraph(largestComp, v1)
>
> a1 <- shortestPath(v2, goid, chip="hgu95av2.db")
>
>
>
>
>
>
> dev.off()
null device
1
>