Several of the methods available for AnnotationDbi objects are
also implemented for EnsDb objects. This enables to extract
data from EnsDb objects in a similar fashion than from objects
inheriting from the base annotation package class
AnnotationDbi.
In addition to the standard usage, the select and
mapIds for EnsDb objects support also the filter
framework of the ensembdb package and thus allow to perform more
fine-grained queries to retrieve data.
Usage
## S4 method for signature 'EnsDb'
columns(x)
## S4 method for signature 'EnsDb'
keys(x, keytype, filter,...)
## S4 method for signature 'EnsDb'
keytypes(x)
## S4 method for signature 'EnsDb'
mapIds(x, keys, column, keytype, ..., multiVals)
## S4 method for signature 'EnsDb'
select(x, keys, columns, keytype, ...)
Arguments
(In alphabetic order)
column
For mapIds: the column to search on, i.e. from which values
should be retrieved.
columns
For select: the columns from which values should be
retrieved. Use the columns method to list all possible
columns.
keys
The keys/ids for which data should be retrieved from the
database. This can be either a character vector of keys/IDs, a
single filter object extending BasicFilter or a
list of such objects.
keytype
For mapIds and select: the type (column) that matches
the provided keys. This argument does not have to be specified if
argument keys is a filter object extending
BasicFilter or a list of such objects.
For keys: which keys should be returned from the database.
filter
For keys: either a single object extending
BasicFilter or a list of such object to
retrieve only specific keys from the database.
multiVals
What should mapIds do when there are multiple values that
could be returned? Options are: "first", "list",
"filter", "asNA". See
mapIds for a detailed description.
x
The EnsDb object.
...
Not used.
Value
See method description above.
Methods and Functions
columns
List all the columns that can be retrieved by the mapIds
and select methods. Note that these column names are
different from the ones supported by the genes,
transcripts etc. methods that can be listed by the
listColumns method.
Returns a character vector of supported column names.
keys
Retrieves all keys from the column name specified with
keytype. By default (if keytype is not provided) it
returns all gene IDs.
Returns a character vector of IDs.
keytypes
List all supported key types (column names).
Returns a character vector of key types.
mapIds
Retrieve the mapped ids for a set of keys that are of a particular
keytype. Argument keys can be either a character vector of
keys/IDs, a single filter object extending
BasicFilter or a list of such objects. For
the latter, the argument keytype does not have to be
specified. Importantly however, if the filtering system is used,
the ordering of the results might not represent the ordering of
the keys.
The method usually returns a named character vector or, depending
on the argument multiVals a named list, with names
corresponding to the keys (same ordering is only guaranteed if
keys is a character vector).
select
Retrieve the data as a data.frame based on parameters for
selected keys, columns and keytype
arguments. Multiple matches of the keys are returned in one row
for each possible match. Argument keys can be either a
character vector of keys/IDs, a single filter object extending
BasicFilter or a list of such objects. For
the latter, the argument keytype does not have to be
specified.
Returns a data.frame with the column names corresponding to
the argument columns and rows with all data matching the
criteria specified with keys.
Author(s)
Johannes Rainer
See Also
BasicFilterlistColumnstranscripts
Examples
library(EnsDb.Hsapiens.v75)
edb <- EnsDb.Hsapiens.v75
## List all supported keytypes.
keytypes(edb)
## List all supported columns for the select and mapIds methods.
columns(edb)
## List /real/ database column names.
listColumns(edb)
## Retrieve all keys corresponding to transcript ids.
txids <- keys(edb, keytype="TXID")
length(txids)
head(txids)
## Retrieve all keys corresponding to gene names of genes encoded on chromosome X
gids <- keys(edb, keytype="GENENAME", filter=SeqnameFilter("X"))
length(gids)
head(gids)
## Get a mapping of the genes BCL2 and BCL2L11 to all of their
## transcript ids and return the result as list
maps <- mapIds(edb, keys=c("BCL2", "BCL2L11"), column="TXID",
keytype="GENENAME", multiVals="list")
maps
## Perform the same query using a combination of a GenenameFilter and a TxbiotypeFilter
## to just retrieve protein coding transcripts for these two genes.
mapIds(edb, keys=list(GenenameFilter(c("BCL2", "BCL2L11")),
TxbiotypeFilter("protein_coding")), column="TXID",
multiVals="list")
## select:
## Retrieve all transcript and gene related information for the above example.
select(edb, keys=list(GenenameFilter(c("BCL2", "BCL2L11")),
TxbiotypeFilter("protein_coding")),
columns=c("GENEID", "GENENAME", "TXID", "TXBIOTYPE", "TXSEQSTART", "TXSEQEND",
"SEQNAME", "SEQSTRAND"))
## Get all data for genes encoded on chromosome Y
Y <- select(edb, keys="Y", keytype="SEQNAME")
head(Y)
nrow(Y)
## Get selected columns for all lincRNAs encoded on chromosome Y
Y <- select(edb, keys=list(SeqnameFilter("Y"), GenebiotypeFilter("lincRNA")),
columns=c("GENEID", "GENEBIOTYPE", "TXID", "GENENAME"))
head(Y)
nrow(Y)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(ensembldb)
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: GenomicRanges
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicFeatures
Loading required package: AnnotationDbi
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/ensembldb/EnsDb-AnnotationDbi.Rd_%03d_medium.png", width=480, height=480)
> ### Name: select
> ### Title: Integration into the AnnotationDbi framework
> ### Aliases: select select,EnsDb-method columns,EnsDb-method
> ### keys,EnsDb-method keytypes,EnsDb-method mapIds,EnsDb-method
> ### Keywords: classes
>
> ### ** Examples
>
>
> library(EnsDb.Hsapiens.v75)
> edb <- EnsDb.Hsapiens.v75
>
> ## List all supported keytypes.
> keytypes(edb)
[1] "ENTREZID" "EXONID" "GENEBIOTYPE" "GENEID" "GENENAME"
[6] "SEQNAME" "SEQSTRAND" "TXBIOTYPE" "TXID"
>
> ## List all supported columns for the select and mapIds methods.
> columns(edb)
[1] "ENTREZID" "EXONID" "EXONIDX" "EXONSEQEND"
[5] "EXONSEQSTART" "GENEBIOTYPE" "GENEID" "GENENAME"
[9] "GENESEQEND" "GENESEQSTART" "ISCIRCULAR" "SEQCOORDSYSTEM"
[13] "SEQLENGTH" "SEQNAME" "SEQSTRAND" "TXBIOTYPE"
[17] "TXCDSSEQEND" "TXCDSSEQSTART" "TXID" "TXSEQEND"
[21] "TXSEQSTART"
>
> ## List /real/ database column names.
> listColumns(edb)
[1] "seq_name" "seq_length" "is_circular" "exon_id"
[5] "exon_seq_start" "exon_seq_end" "gene_id" "gene_name"
[9] "entrezid" "gene_biotype" "gene_seq_start" "gene_seq_end"
[13] "seq_name" "seq_strand" "seq_coord_system" "name"
[17] "value" "tx_id" "tx_biotype" "tx_seq_start"
[21] "tx_seq_end" "tx_cds_seq_start" "tx_cds_seq_end" "gene_id"
[25] "tx_id" "exon_id" "exon_idx"
>
> ## Retrieve all keys corresponding to transcript ids.
> txids <- keys(edb, keytype="TXID")
> length(txids)
[1] 215647
> head(txids)
[1] "ENST00000000233" "ENST00000000412" "ENST00000000442" "ENST00000001008"
[5] "ENST00000001146" "ENST00000002125"
>
> ## Retrieve all keys corresponding to gene names of genes encoded on chromosome X
> gids <- keys(edb, keytype="GENENAME", filter=SeqnameFilter("X"))
> length(gids)
[1] 2342
> head(gids)
[1] "TSPAN6" "TNMD" "LAS1L" "CD99" "KLHL13" "ARX"
>
> ## Get a mapping of the genes BCL2 and BCL2L11 to all of their
> ## transcript ids and return the result as list
> maps <- mapIds(edb, keys=c("BCL2", "BCL2L11"), column="TXID",
+ keytype="GENENAME", multiVals="list")
> maps
$BCL2
[1] "ENST00000398117" "ENST00000333681" "ENST00000590515" "ENST00000589955"
[5] "ENST00000444484"
$BCL2L11
[1] "ENST00000432179" "ENST00000308659" "ENST00000393256" "ENST00000393252"
[5] "ENST00000433098" "ENST00000405953" "ENST00000415458" "ENST00000436733"
[9] "ENST00000437029" "ENST00000452231" "ENST00000361493" "ENST00000431217"
[13] "ENST00000439718" "ENST00000438054" "ENST00000357757" "ENST00000393253"
[17] "ENST00000337565"
>
> ## Perform the same query using a combination of a GenenameFilter and a TxbiotypeFilter
> ## to just retrieve protein coding transcripts for these two genes.
> mapIds(edb, keys=list(GenenameFilter(c("BCL2", "BCL2L11")),
+ TxbiotypeFilter("protein_coding")), column="TXID",
+ multiVals="list")
$BCL2L11
[1] "ENST00000432179" "ENST00000308659" "ENST00000393256" "ENST00000393252"
[5] "ENST00000405953" "ENST00000438054" "ENST00000357757" "ENST00000393253"
[9] "ENST00000337565"
$BCL2
[1] "ENST00000398117" "ENST00000333681" "ENST00000589955" "ENST00000444484"
Warning message:
In .mapIds(x = x, keys = keys, column = column, keytype = keytype, :
Got 2 filter objects. Will use the keys of the first for the mapping!
>
> ## select:
> ## Retrieve all transcript and gene related information for the above example.
> select(edb, keys=list(GenenameFilter(c("BCL2", "BCL2L11")),
+ TxbiotypeFilter("protein_coding")),
+ columns=c("GENEID", "GENENAME", "TXID", "TXBIOTYPE", "TXSEQSTART", "TXSEQEND",
+ "SEQNAME", "SEQSTRAND"))
GENEID GENENAME TXID TXBIOTYPE TXSEQSTART TXSEQEND
1 ENSG00000153094 BCL2L11 ENST00000432179 protein_coding 111876955 111881689
2 ENSG00000153094 BCL2L11 ENST00000308659 protein_coding 111878491 111922625
3 ENSG00000153094 BCL2L11 ENST00000393256 protein_coding 111878506 111926024
4 ENSG00000153094 BCL2L11 ENST00000393252 protein_coding 111880247 111881537
5 ENSG00000153094 BCL2L11 ENST00000405953 protein_coding 111881323 111886414
6 ENSG00000153094 BCL2L11 ENST00000438054 protein_coding 111881329 111903861
7 ENSG00000153094 BCL2L11 ENST00000357757 protein_coding 111878491 111919016
8 ENSG00000153094 BCL2L11 ENST00000393253 protein_coding 111878491 111909428
9 ENSG00000153094 BCL2L11 ENST00000337565 protein_coding 111878491 111886423
10 ENSG00000171791 BCL2 ENST00000398117 protein_coding 60790579 60987361
11 ENSG00000171791 BCL2 ENST00000333681 protein_coding 60794268 60987019
12 ENSG00000171791 BCL2 ENST00000589955 protein_coding 60985135 60986045
13 ENSG00000171791 BCL2 ENST00000444484 protein_coding 60985187 60986613
SEQNAME SEQSTRAND
1 2 1
2 2 1
3 2 1
4 2 1
5 2 1
6 2 1
7 2 1
8 2 1
9 2 1
10 18 -1
11 18 -1
12 18 -1
13 18 -1
>
> ## Get all data for genes encoded on chromosome Y
> Y <- select(edb, keys="Y", keytype="SEQNAME")
> head(Y)
ENTREZID EXONID EXONIDX EXONSEQEND EXONSEQSTART GENEBIOTYPE
1 8284 ENSE00001733393 1 21906809 21906557 protein_coding
2 8284 ENSE00000891759 2 21906439 21906271 protein_coding
3 8284 ENSE00000652508 3 21905125 21905048 protein_coding
4 8284 ENSE00000652506 4 21903743 21903621 protein_coding
5 8284 ENSE00001788914 5 21903374 21903204 protein_coding
6 8284 ENSE00001805865 6 21901548 21901414 protein_coding
GENEID GENENAME GENESEQEND GENESEQSTART ISCIRCULAR SEQCOORDSYSTEM
1 ENSG00000012817 KDM5D 21906825 21865751 0 chromosome
2 ENSG00000012817 KDM5D 21906825 21865751 0 chromosome
3 ENSG00000012817 KDM5D 21906825 21865751 0 chromosome
4 ENSG00000012817 KDM5D 21906825 21865751 0 chromosome
5 ENSG00000012817 KDM5D 21906825 21865751 0 chromosome
6 ENSG00000012817 KDM5D 21906825 21865751 0 chromosome
SEQLENGTH SEQNAME SEQSTRAND TXBIOTYPE TXCDSSEQEND TXCDSSEQSTART
1 59373566 Y -1 protein_coding 21906420 21867881
2 59373566 Y -1 protein_coding 21906420 21867881
3 59373566 Y -1 protein_coding 21906420 21867881
4 59373566 Y -1 protein_coding 21906420 21867881
5 59373566 Y -1 protein_coding 21906420 21867881
6 59373566 Y -1 protein_coding 21906420 21867881
TXID TXSEQEND TXSEQSTART
1 ENST00000317961 21906809 21867301
2 ENST00000317961 21906809 21867301
3 ENST00000317961 21906809 21867301
4 ENST00000317961 21906809 21867301
5 ENST00000317961 21906809 21867301
6 ENST00000317961 21906809 21867301
> nrow(Y)
[1] 3744
>
> ## Get selected columns for all lincRNAs encoded on chromosome Y
> Y <- select(edb, keys=list(SeqnameFilter("Y"), GenebiotypeFilter("lincRNA")),
+ columns=c("GENEID", "GENEBIOTYPE", "TXID", "GENENAME"))
> head(Y)
GENEID GENEBIOTYPE TXID GENENAME
1 ENSG00000129816 lincRNA ENST00000250776 TTTY1B
2 ENSG00000129845 lincRNA ENST00000250805 TTTY1
3 ENSG00000131538 lincRNA ENST00000253838 TTTY6
4 ENSG00000131538 lincRNA ENST00000538537 TTTY6
5 ENSG00000147753 lincRNA ENST00000276770 TTTY7
6 ENSG00000147753 lincRNA ENST00000449828 TTTY7
> nrow(Y)
[1] 66
>
>
>
>
>
>
> dev.off()
null device
1
>