PSICQUIC is an effort from the HUPO Proteomics Standard Initiative (HUPO-PSI) to
standardise programmatic access to molecular interaction databases. The
Bioconductor PSICQUIC package provides a traditional R interface layered on top
of the PSICQUIC REST interface. Gene symbols are most commonly used in
queries; interactions are returned in a data.frame, characterized by
interaction type, detection method, and publication references.
Confidence scores are sometimes avaialable. Queries may be constrained
by many of these same attributes, i.e., interaction type, detection
method, species, publication identifier, and source database.
Details
There are two operational differences between the native PSIQUIC REST
interface and that offered here via the interactions method:
species exclusivity:
The REST interface requires only that one
participant in an interaction be from the named species. By
default, we require that both participants are from the named
species. This can be controlled by the speciesExclusive
logical argument to the interactions method.
number of molecules (or identifiers):
the REST interface permits only zero, one or two
gene or protein identifiers per query. We allow any number, zero or
more and, when the number is greater than or equal to two, the
interactions returned are only those in which any two of those identifiers
participate. If you want all interactions which include any of a
list of identifiers, and you don't care to control for their
partners, you can accomplish this by issuing successive
single-identifer interaction queries.
Constructor
PSICQUIC: contacts the central PSICQUIC web server,
discovers currently functioning servers, returns an object
used in the methods below.
Methods
providers(x): lists the short names of the data providers
interactions(x,id,
species, speciesExclusive,
type, provider,detectionMethod,
publicationID, quiet): retrieves all interactions matching the specified pattern.
rawQuery(x, provider, rawArgs): query terms in native PSICQUIC REST style
show(x): displays current providers and related data
Functions
detectionMethods(): your web browser will display the
PSI-MI ontology for detection methods
interactionTypes(): your web browser will display the
PSI-MI ontology for molecular
interaction types
psicquic <- PSICQUIC()
# obtain the list of two dozen (or so) currently live
# PSICQUIC-compliant data providers
providers(psicquic)
# a minimal call: get all interactions with MAP3K3, of all types,
# from all providers. a data.frame is returned
tbl.0 <- interactions(psicquic, "MAP3K3", species="9606")
# build a contingency table, sort it, and see
# what kinds of interactions were returned, obtained
# by what detection methods.
# "-" is used when the provider does not specify a value.
# you will see a wide range of specificity, in detection method,
# interaction type, and number of interactions found.
xtab <- with(tbl.0, as.data.frame(
table(type, detectionMethod, provider)))
xtab <- subset(xtab, Freq > 0) # [order(xtab$Freq, decreasing=TRUE),]
xtab <- xtab[order(xtab$Freq, decreasing=TRUE),]
# what interactors were returned? the IDMapper class in this
# package converts many PSICQUIC providers' protein identifiers to
# entrez geneIDs and HUGO gene symbols, via remote calls to
# biomaRt:
idMapper <- IDMapper("9606")
tbl.0g <- addGeneInfo(idMapper, tbl.0)
with(tbl.0g, head(unique(c(A.name, B.name))))
# we see that MAP2K5 is the most frequently mentioned interacator:
xtab.sym <- with(tbl.0g, table(c(A.name, B.name)))
head(sort(xtab.sym, decreasing=TRUE))
# PSIQUIC uses well-devloped ontologies -- controlled vocabularies --
# which are currently best viewed in a web browser.
# we provide two convenience functions which will display these
# hierarchically defined vocabularies:
# interactionTypes()
# detectionMethods()
# NCBI curates taxonomy codes, such as "9606" for Homo sapiens.
# you can find these codes by using this method, which will
# drive your browser to the appropriate NCBI web page.
# speciesIds()
# use terms from these vocabularies to retrieve interaction
# information for these two proteins. note that both of
# these terms are mid-level in their respective hierarchies
# and will likely retrieve more specific nested terms
tbl.2 <- interactions(psicquic, id=c("MAP3K3", "MAP2K5"),
species="9606",
type="physical association",
detectionMethod="affinity chromatography technology")
# add gene IDs and symbols
tbl.2g <- addGeneInfo(idMapper, tbl.2)
# how many publications lie behind these interactions?
tbl.2g[, c("A.name", "B.name", "detectionMethod", "firstAuthor")]
# the package also provides a convenience method for submitting
# queries in native MIQL (Molecular Interaction Query Language).
# the language is defined here:
# http://code.google.com/p/psicquic/wiki/MiqlReference27
if("BioGrid" %in% providers(psicquic)){
tbl.3 <- rawQuery(psicquic, "BioGrid", "identifier:ALK AND species:9606")
# what publications?
table(tbl.3$V8)
}
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(PSICQUIC)
Loading required package: IRanges
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: biomaRt
Loading required package: httr
Loading required package: plyr
Attaching package: 'plyr'
The following object is masked from 'package:IRanges':
desc
The following object is masked from 'package:S4Vectors':
rename
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/PSICQUIC/PSICQUIC-class.Rd_%03d_medium.png", width=480, height=480)
> ### Name: PSICQUIC-class
> ### Title: PSICQUIC
> ### Aliases: class:PSICQUIC PSICQUIC-class PSICQUIC show,PSICQUIC-method
> ### Keywords: methods classes
>
> ### ** Examples
>
> psicquic <- PSICQUIC()
> # obtain the list of two dozen (or so) currently live
> # PSICQUIC-compliant data providers
> providers(psicquic)
[1] "APID" "BioGrid" "bhf-ucl"
[4] "ChEMBL" "DIP" "HPIDb"
[7] "InnateDB" "InnateDB-All" "IntAct"
[10] "mentha" "MPIDB" "MatrixDB"
[13] "MINT" "Reactome" "Reactome-FIs"
[16] "I2D" "I2D-IMEx" "InnateDB-IMEx"
[19] "MolCon" "UniProt" "MBInfo"
[22] "BindingDB" "VirHostNet" "Spike"
[25] "BAR" "EBI-GOA-nonIntAct" "ZINC"
> # a minimal call: get all interactions with MAP3K3, of all types,
> # from all providers. a data.frame is returned
> tbl.0 <- interactions(psicquic, "MAP3K3", species="9606")
>
> # build a contingency table, sort it, and see
> # what kinds of interactions were returned, obtained
> # by what detection methods.
> # "-" is used when the provider does not specify a value.
> # you will see a wide range of specificity, in detection method,
> # interaction type, and number of interactions found.
>
> xtab <- with(tbl.0, as.data.frame(
+ table(type, detectionMethod, provider)))
> xtab <- subset(xtab, Freq > 0) # [order(xtab$Freq, decreasing=TRUE),]
> xtab <- xtab[order(xtab$Freq, decreasing=TRUE),]
>
> # what interactors were returned? the IDMapper class in this
> # package converts many PSICQUIC providers' protein identifiers to
> # entrez geneIDs and HUGO gene symbols, via remote calls to
> # biomaRt:
>
> idMapper <- IDMapper("9606")
checking for biomart access...
does 'http://www.ensembl.org' respond?
creating ensembl mart
hsapiens_gene_ensembl dataset provided?
connecting to biomart...
> tbl.0g <- addGeneInfo(idMapper, tbl.0)
> with(tbl.0g, head(unique(c(A.name, B.name))))
[1] "MAP3K3" "-" "BRCA1" "MAP2K5" "WNK1" "CHUK"
>
> # we see that MAP2K5 is the most frequently mentioned interacator:
>
> xtab.sym <- with(tbl.0g, table(c(A.name, B.name)))
> head(sort(xtab.sym, decreasing=TRUE))
MAP3K3 - MAP2K5 TRAF6 TRAF7 YWHAE
797 300 30 16 16 16
>
> # PSIQUIC uses well-devloped ontologies -- controlled vocabularies --
> # which are currently best viewed in a web browser.
> # we provide two convenience functions which will display these
> # hierarchically defined vocabularies:
>
> # interactionTypes()
> # detectionMethods()
>
> # NCBI curates taxonomy codes, such as "9606" for Homo sapiens.
> # you can find these codes by using this method, which will
> # drive your browser to the appropriate NCBI web page.
>
> # speciesIds()
>
> # use terms from these vocabularies to retrieve interaction
> # information for these two proteins. note that both of
> # these terms are mid-level in their respective hierarchies
> # and will likely retrieve more specific nested terms
>
> tbl.2 <- interactions(psicquic, id=c("MAP3K3", "MAP2K5"),
+ species="9606",
+ type="physical association",
+ detectionMethod="affinity chromatography technology")
>
> # add gene IDs and symbols
>
> tbl.2g <- addGeneInfo(idMapper, tbl.2)
>
> # how many publications lie behind these interactions?
> tbl.2g[, c("A.name", "B.name", "detectionMethod", "firstAuthor")]
A.name B.name detectionMethod
1 MAP2K5 MAP3K3 psi-mi:MI:0004(affinity chromatography technology)
2 MAP3K3 MAP2K5 psi-mi:MI:0004(affinity chromatography technology)
3 MAP3K3 MAP2K5 psi-mi:MI:0004(affinity chromatography technology)
4 MAP3K3 MAP2K5 psi-mi:MI:0004(affinity chromatography technology)
5 - - psi-mi:MI:0004(affinity chromatography technology)
6 - - psi-mi:MI:0004(affinity chromatography technology)
7 - - psi-mi:MI:0004(affinity chromatography technology)
8 - - psi-mi:MI:0004(affinity chromatography technology)
9 MAP3K3 MAP2K5 psi-mi:MI:0676(tandem affinity purification)
10 MAP2K5 MAP3K3 psi-mi:MI:0813(proximity ligation assay)
11 MAP3K3 MAP2K5 psi-mi:MI:0004(affinity chromatography technology)
12 MAP3K3 MAP2K5 psi-mi:MI:0004(affinity chromatography technology)
13 MAP3K3 MAP2K5 psi-mi:MI:0004(affinity chromatography technology)
14 MAP3K3 MAP2K5 psi-mi:MI:0004(affinity chromatography technology)
firstAuthor
1 Sun W (2001)
2 Bouwmeester T (2004)
3 Nakamura K (2010)
4 Lamark T (2003)
5 Nakamura et al.(2010)
6 Lamark et al. (2003)
7 Sun et al. (2001)
8 Bouwmeester et al. (2004)
9 Bouwmeester et al. (2004)
10 Chen et al. (2014)
11 -
12 -
13 -
14 -
>
> # the package also provides a convenience method for submitting
> # queries in native MIQL (Molecular Interaction Query Language).
> # the language is defined here:
> # http://code.google.com/p/psicquic/wiki/MiqlReference27
>
> if("BioGrid" %in% providers(psicquic)){
+ tbl.3 <- rawQuery(psicquic, "BioGrid", "identifier:ALK AND species:9606")
+ # what publications?
+ table(tbl.3$V8)
+ }
Ambrogio C (2005) Bai RY (1998) Bonvini P (2002) Crockett DK (2004)
20 3 2 43
Fenner BJ (2010) Miyake I (2002) Ouyang T (2003) Pao-Chun L (2009)
1 1 3 2
Ren SY (2005) Stoica GE (2001) Taipale M (2012) Zamo A (2002)
1 1 2 1
>
>
>
>
>
> dev.off()
null device
1
>