R: AnnotationDb objects and their progeny, methods etc.
AnnotationDb-objects
R Documentation
AnnotationDb objects and their progeny, methods etc.
Description
AnnotationDb is the virtual base class for all annotation
packages. It contain a database connection and is meant to be the
parent for a set of classes in the Bioconductor annotation
packages. These classes will provide a means of dispatch for a
widely available set of select methods and thus allow the
easy extraction of data from the annotation packages.
select, columns and keys are used together to
extract data from an AnnotationDb object (or any object derived
from the parent class). Examples of classes derived from the
AnnotationDb object include (but are not limited to):
ChipDb, OrgDbGODb, InparanoidDb and
ReactomeDb.
columns shows which kinds of data can be returned for the
AnnotationDb object.
keytypes allows the user to discover which keytypes can be
passed in to select or keys and the keytype
argument.
keys returns keys for the database contained in the
AnnotationDb object . This method is already documented in the
keys manual page but is mentioned again here because it's usage with
select is so intimate. By default it will return the primary
keys for the database, but if used with the keytype argument,
it will return the keys from that keytype.
select will retrieve the data as a data.frame based on
parameters for selected keyscolumns and keytype
arguments. Users should be warned that if you call select and request
columns that have multiple matches for your keys, select will return a
data.frame with one row for each possible match. This has the effect that if
you request multiple columns and some of them have a many to one relationship
to the keys, things will continue to multiply accordingly. So it's not a good
idea to request a large number of columns unless you know that what you are
asking for should have a one to one relationship with the initial set of keys.
In general, if you need to retrieve a column (like GO) that has a many to one
relationship to the original keys, it is most useful to extract that
separately.
mapIds gets the mapped ids (column) for a set of keys that are of a
particular keytype. Usually returned as a named character vector.
saveDb will take an AnnotationDb object and save the database
to the file specified by the path passed in to the file
argument.
loadDb takes a .sqlite database file as an argument and uses
data in the metadata table of that file to return an AnnotationDb
style object of the appropriate type.
species shows the genus and species label currently attached to
the AnnotationDb objects database.
dbfile gets the database file associated with an object.
dbconn gets the datebase connection associated with an object.
taxonomyId gets the taxonomy ID associated with an object (if available).
the AnnotationDb object. But in practice this will mean an
object derived from an AnnotationDb object such as a OrgDb or
ChipDb object.
keys
the keys to select records for from the database. All possible
keys are returned by using the keys method.
columns
the columns or kinds of things that can be retrieved
from the database. As with keys, all possible columns are
returned by using the columns method.
keytype
the keytype that matches the keys used. For the
select methods, this is used to indicate the kind of ID being used
with the keys argument. For the keys method this is used to
indicate which kind of keys are desired from keys
column
the column to search on (for mapIds). Different from
columns in that it can only have a single element for the value
...
other arguments. These include:
pattern:
the pattern to match (used by keys)
column:
the column to search on. This is used by keys and is
for when the thing you want to pattern match is different from
the keytype, or when you want to simply want to get keys that
have a value for the thing specified by the column argument.
fuzzy:
TRUE or FALSE value. Use fuzzy matching? (this is
used with pattern by the keys method)
multiVals
What should mapIds do when there are multiple values
that could be returned? Options include:
first:
This value means that when there are multiple matches only the 1st thing that comes back will be returned. This is the default behavior
list:
This will just returns a list object to the end user
filter:
This will remove all elements that contain multiple matches and will therefore return a shorter vector than what came in whenever some of the keys match more than one value
asNA:
This will return an NA value whenever there are multiple matches
CharacterList:
This just returns a SimpleCharacterList object
FUN:
You can also supply a function to the multiVals argument for custom behaviors. The function must take a single argument and return a single value. This function will be applied to all the elements and will serve a 'rule' that for which thing to keep when there is more than one element. So for example this example function will always grab the last element in each result: last <- function(x){x[[length(x)]]}
file
an sqlite file path. A string the represents the
full name you want for your sqlite database and also where to put it.
packageName
for internal use only
Value
keys,columns and keytypes each return a character
vector or possible values. select returns a data.frame.
require(hgu95av2.db)
## display the columns
columns(hgu95av2.db)
## get the 1st 6 possible keys
keys <- head( keys(hgu95av2.db) )
keys
## lookup gene symbol and unigene ID for the 1st 6 keys
select(hgu95av2.db, keys=keys, columns = c("SYMBOL","UNIGENE"))
## get keys based on unigene
keyunis <- head( keys(hgu95av2.db, keytype="UNIGENE") )
keyunis
## list supported key types
keytypes(hgu95av2.db)
## lookup gene symbol and unigene ID based on unigene IDs by setting
## the keytype to "UNIGENE" and passing in unigene keys:
select(hgu95av2.db, keys=keyunis, columns = c("SYMBOL","UNIGENE"),
keytype="UNIGENE")
keys <- head(keys(hgu95av2.db, 'ENTREZID'))
## get a default result (captures only the 1st element)
mapIds(hgu95av2.db, keys=keys, column='ALIAS', keytype='ENTREZID')
## or use a different option
mapIds(hgu95av2.db, keys=keys, column='ALIAS', keytype='ENTREZID',
multiVals="CharacterList")
## Or define your own function
last <- function(x){x[[length(x)]]}
mapIds(hgu95av2.db, keys=keys, column='ALIAS', keytype='ENTREZID',
multiVals=last)
## For other ways to access the DB, you can use dbfile() or dbconn() to extract
dbconn(hgu95av2.db)
dbfile(hgu95av2.db)
## Try to retrieve an associated taxonomyId
taxonomyId(hgu95av2.db)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(AnnotationDbi)
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: IRanges
Loading required package: S4Vectors
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/AnnotationDbi/AnnotationDb-class.Rd_%03d_medium.png", width=480, height=480)
> ### Name: AnnotationDb-objects
> ### Title: AnnotationDb objects and their progeny, methods etc.
> ### Aliases: AnnotationDb class:AnnotationDb AnnotationDb-class
> ### ChipDb-class GODb-class InparanoidDb-class OrgDb-class
> ### ReactomeDb-class loadDb dbconn,AnnotationDb-method saveDb
> ### saveDb,AnnotationDb-method show,AnnotationDb-method
> ### metadata,AnnotationDb-method names,AnnotationDb-method species
> ### species,AnnotationDb-method dbfile,AnnotationDb-method taxonomyId
> ### taxonomyId,AnnotationDb-method cols columns
> ### columns,AnnotationDb-method columns,OrgDb-method
> ### columns,ChipDb-method columns,GODb-method columns,InparanoidDb-method
> ### columns,Inparanoid8Db-method columns,ReactomeDb-method keytypes
> ### keytypes,OrgDb-method keytypes,ChipDb-method keytypes,GODb-method
> ### keytypes,InparanoidDb-method keytypes,Inparanoid8Db-method
> ### keytypes,ReactomeDb-method keys keys,OrgDb-method keys,ChipDb-method
> ### keys,GODb-method keys,InparanoidDb-method keys,Inparanoid8Db-method
> ### keys,ReactomeDb-method select select,OrgDb-method
> ### select,ChipDb-method select,GODb-method select,InparanoidDb-method
> ### select,Inparanoid8Db-method select,ReactomeDb-method mapIds
> ### mapIds,AnnotationDb-method
> ### Keywords: classes methods
>
> ### ** Examples
>
> require(hgu95av2.db)
Loading required package: hgu95av2.db
Loading required package: org.Hs.eg.db
> ## display the columns
> columns(hgu95av2.db)
[1] "ACCNUM" "ALIAS" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS"
[6] "ENTREZID" "ENZYME" "EVIDENCE" "EVIDENCEALL" "GENENAME"
[11] "GO" "GOALL" "IPI" "MAP" "OMIM"
[16] "ONTOLOGY" "ONTOLOGYALL" "PATH" "PFAM" "PMID"
[21] "PROBEID" "PROSITE" "REFSEQ" "SYMBOL" "UCSCKG"
[26] "UNIGENE" "UNIPROT"
> ## get the 1st 6 possible keys
> keys <- head( keys(hgu95av2.db) )
> keys
[1] "1000_at" "1001_at" "1002_f_at" "1003_s_at" "1004_at" "1005_at"
> ## lookup gene symbol and unigene ID for the 1st 6 keys
> select(hgu95av2.db, keys=keys, columns = c("SYMBOL","UNIGENE"))
'select()' returned 1:1 mapping between keys and columns
PROBEID SYMBOL UNIGENE
1 1000_at MAPK3 Hs.861
2 1001_at TIE1 Hs.78824
3 1002_f_at CYP2C19 Hs.282409
4 1003_s_at CXCR5 Hs.113916
5 1004_at CXCR5 Hs.113916
6 1005_at DUSP1 Hs.171695
>
> ## get keys based on unigene
> keyunis <- head( keys(hgu95av2.db, keytype="UNIGENE") )
> keyunis
[1] "Hs.529161" "Hs.709582" "Hs.88556" "Hs.212838" "Hs.569125" "Hs.591847"
> ## list supported key types
> keytypes(hgu95av2.db)
[1] "ACCNUM" "ALIAS" "ENSEMBL" "ENSEMBLPROT" "ENSEMBLTRANS"
[6] "ENTREZID" "ENZYME" "EVIDENCE" "EVIDENCEALL" "GENENAME"
[11] "GO" "GOALL" "IPI" "MAP" "OMIM"
[16] "ONTOLOGY" "ONTOLOGYALL" "PATH" "PFAM" "PMID"
[21] "PROBEID" "PROSITE" "REFSEQ" "SYMBOL" "UCSCKG"
[26] "UNIGENE" "UNIPROT"
> ## lookup gene symbol and unigene ID based on unigene IDs by setting
> ## the keytype to "UNIGENE" and passing in unigene keys:
> select(hgu95av2.db, keys=keyunis, columns = c("SYMBOL","UNIGENE"),
+ keytype="UNIGENE")
'select()' returned 1:many mapping between keys and columns
UNIGENE SYMBOL
1 Hs.529161 A1BG
2 Hs.709582 A1BG
3 Hs.709582 A1BG-AS1
4 Hs.88556 A2M
5 Hs.88556 HDAC1
6 Hs.212838 A2M
7 Hs.569125 A2MP1
8 Hs.591847 NAT1
>
> keys <- head(keys(hgu95av2.db, 'ENTREZID'))
> ## get a default result (captures only the 1st element)
> mapIds(hgu95av2.db, keys=keys, column='ALIAS', keytype='ENTREZID')
'select()' returned 1:many mapping between keys and columns
10 100 1000 10000 100008586 10001
"AAC2" "ADA" "CD325" "MPPH" "GAGE12F" "ARC33"
> ## or use a different option
> mapIds(hgu95av2.db, keys=keys, column='ALIAS', keytype='ENTREZID',
+ multiVals="CharacterList")
'select()' returned 1:many mapping between keys and columns
CharacterList of length 6
[["10"]] AAC2 NAT-2 PNAT NAT2
[["100"]] ADA
[["1000"]] CD325 CDHN CDw325 NCAD CDH2
[["10000"]] MPPH MPPH2 PKB-GAMMA PKBG PRKBG RAC-PK-gamma RAC-gamma STK-2 AKT3
[["100008586"]] GAGE12F
[["10001"]] ARC33 NY-REN-28 MED6
> ## Or define your own function
> last <- function(x){x[[length(x)]]}
> mapIds(hgu95av2.db, keys=keys, column='ALIAS', keytype='ENTREZID',
+ multiVals=last)
'select()' returned 1:many mapping between keys and columns
10 100 1000 10000 100008586 10001
"NAT2" "ADA" "CDH2" "AKT3" "GAGE12F" "MED6"
>
> ## For other ways to access the DB, you can use dbfile() or dbconn() to extract
> dbconn(hgu95av2.db)
<SQLiteConnection>
> dbfile(hgu95av2.db)
[1] "/home/ddbj/local/lib64/R/library/hgu95av2.db/extdata/hgu95av2.sqlite"
>
> ## Try to retrieve an associated taxonomyId
> taxonomyId(hgu95av2.db)
[1] 9606
>
>
>
>
>
> dev.off()
null device
1
>