Last data update: 2014.03.03

R: Integration into the AnnotationDbi framework
selectR Documentation

Integration into the AnnotationDbi framework

Description

Several of the methods available for AnnotationDbi objects are also implemented for EnsDb objects. This enables to extract data from EnsDb objects in a similar fashion than from objects inheriting from the base annotation package class AnnotationDbi. In addition to the standard usage, the select and mapIds for EnsDb objects support also the filter framework of the ensembdb package and thus allow to perform more fine-grained queries to retrieve data.

Usage


## S4 method for signature 'EnsDb'
columns(x)
## S4 method for signature 'EnsDb'
keys(x, keytype, filter,...)
## S4 method for signature 'EnsDb'
keytypes(x)
## S4 method for signature 'EnsDb'
mapIds(x, keys, column, keytype, ..., multiVals)
## S4 method for signature 'EnsDb'
select(x, keys, columns, keytype, ...)

Arguments

(In alphabetic order)

column

For mapIds: the column to search on, i.e. from which values should be retrieved.

columns

For select: the columns from which values should be retrieved. Use the columns method to list all possible columns.

keys

The keys/ids for which data should be retrieved from the database. This can be either a character vector of keys/IDs, a single filter object extending BasicFilter or a list of such objects.

keytype

For mapIds and select: the type (column) that matches the provided keys. This argument does not have to be specified if argument keys is a filter object extending BasicFilter or a list of such objects.

For keys: which keys should be returned from the database.

filter

For keys: either a single object extending BasicFilter or a list of such object to retrieve only specific keys from the database.

multiVals

What should mapIds do when there are multiple values that could be returned? Options are: "first", "list", "filter", "asNA". See mapIds for a detailed description.

x

The EnsDb object.

...

Not used.

Value

See method description above.

Methods and Functions

columns

List all the columns that can be retrieved by the mapIds and select methods. Note that these column names are different from the ones supported by the genes, transcripts etc. methods that can be listed by the listColumns method.

Returns a character vector of supported column names.

keys

Retrieves all keys from the column name specified with keytype. By default (if keytype is not provided) it returns all gene IDs.

Returns a character vector of IDs.

keytypes

List all supported key types (column names).

Returns a character vector of key types.

mapIds

Retrieve the mapped ids for a set of keys that are of a particular keytype. Argument keys can be either a character vector of keys/IDs, a single filter object extending BasicFilter or a list of such objects. For the latter, the argument keytype does not have to be specified. Importantly however, if the filtering system is used, the ordering of the results might not represent the ordering of the keys.

The method usually returns a named character vector or, depending on the argument multiVals a named list, with names corresponding to the keys (same ordering is only guaranteed if keys is a character vector).

select

Retrieve the data as a data.frame based on parameters for selected keys, columns and keytype arguments. Multiple matches of the keys are returned in one row for each possible match. Argument keys can be either a character vector of keys/IDs, a single filter object extending BasicFilter or a list of such objects. For the latter, the argument keytype does not have to be specified.

Returns a data.frame with the column names corresponding to the argument columns and rows with all data matching the criteria specified with keys.

Author(s)

Johannes Rainer

See Also

BasicFilter listColumns transcripts

Examples


library(EnsDb.Hsapiens.v75)
edb <- EnsDb.Hsapiens.v75

## List all supported keytypes.
keytypes(edb)

## List all supported columns for the select and mapIds methods.
columns(edb)

## List /real/ database column names.
listColumns(edb)

## Retrieve all keys corresponding to transcript ids.
txids <- keys(edb, keytype="TXID")
length(txids)
head(txids)

## Retrieve all keys corresponding to gene names of genes encoded on chromosome X
gids <- keys(edb, keytype="GENENAME", filter=SeqnameFilter("X"))
length(gids)
head(gids)

## Get a mapping of the genes BCL2 and BCL2L11 to all of their
## transcript ids and return the result as list
maps <- mapIds(edb, keys=c("BCL2", "BCL2L11"), column="TXID",
               keytype="GENENAME", multiVals="list")
maps

## Perform the same query using a combination of a GenenameFilter and a TxbiotypeFilter
## to just retrieve protein coding transcripts for these two genes.
mapIds(edb, keys=list(GenenameFilter(c("BCL2", "BCL2L11")),
                      TxbiotypeFilter("protein_coding")), column="TXID",
       multiVals="list")

## select:
## Retrieve all transcript and gene related information for the above example.
select(edb, keys=list(GenenameFilter(c("BCL2", "BCL2L11")),
                      TxbiotypeFilter("protein_coding")),
       columns=c("GENEID", "GENENAME", "TXID", "TXBIOTYPE", "TXSEQSTART", "TXSEQEND",
                 "SEQNAME", "SEQSTRAND"))

## Get all data for genes encoded on chromosome Y
Y <- select(edb, keys="Y", keytype="SEQNAME")
head(Y)
nrow(Y)

## Get selected columns for all lincRNAs encoded on chromosome Y
Y <- select(edb, keys=list(SeqnameFilter("Y"), GenebiotypeFilter("lincRNA")),
            columns=c("GENEID", "GENEBIOTYPE", "TXID", "GENENAME"))
head(Y)
nrow(Y)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(ensembldb)
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: GenomicRanges
Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicFeatures
Loading required package: AnnotationDbi
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/ensembldb/EnsDb-AnnotationDbi.Rd_%03d_medium.png", width=480, height=480)
> ### Name: select
> ### Title: Integration into the AnnotationDbi framework
> ### Aliases: select select,EnsDb-method columns,EnsDb-method
> ###   keys,EnsDb-method keytypes,EnsDb-method mapIds,EnsDb-method
> ### Keywords: classes
> 
> ### ** Examples
> 
> 
> library(EnsDb.Hsapiens.v75)
> edb <- EnsDb.Hsapiens.v75
> 
> ## List all supported keytypes.
> keytypes(edb)
[1] "ENTREZID"    "EXONID"      "GENEBIOTYPE" "GENEID"      "GENENAME"   
[6] "SEQNAME"     "SEQSTRAND"   "TXBIOTYPE"   "TXID"       
> 
> ## List all supported columns for the select and mapIds methods.
> columns(edb)
 [1] "ENTREZID"       "EXONID"         "EXONIDX"        "EXONSEQEND"    
 [5] "EXONSEQSTART"   "GENEBIOTYPE"    "GENEID"         "GENENAME"      
 [9] "GENESEQEND"     "GENESEQSTART"   "ISCIRCULAR"     "SEQCOORDSYSTEM"
[13] "SEQLENGTH"      "SEQNAME"        "SEQSTRAND"      "TXBIOTYPE"     
[17] "TXCDSSEQEND"    "TXCDSSEQSTART"  "TXID"           "TXSEQEND"      
[21] "TXSEQSTART"    
> 
> ## List /real/ database column names.
> listColumns(edb)
 [1] "seq_name"         "seq_length"       "is_circular"      "exon_id"         
 [5] "exon_seq_start"   "exon_seq_end"     "gene_id"          "gene_name"       
 [9] "entrezid"         "gene_biotype"     "gene_seq_start"   "gene_seq_end"    
[13] "seq_name"         "seq_strand"       "seq_coord_system" "name"            
[17] "value"            "tx_id"            "tx_biotype"       "tx_seq_start"    
[21] "tx_seq_end"       "tx_cds_seq_start" "tx_cds_seq_end"   "gene_id"         
[25] "tx_id"            "exon_id"          "exon_idx"        
> 
> ## Retrieve all keys corresponding to transcript ids.
> txids <- keys(edb, keytype="TXID")
> length(txids)
[1] 215647
> head(txids)
[1] "ENST00000000233" "ENST00000000412" "ENST00000000442" "ENST00000001008"
[5] "ENST00000001146" "ENST00000002125"
> 
> ## Retrieve all keys corresponding to gene names of genes encoded on chromosome X
> gids <- keys(edb, keytype="GENENAME", filter=SeqnameFilter("X"))
> length(gids)
[1] 2342
> head(gids)
[1] "TSPAN6" "TNMD"   "LAS1L"  "CD99"   "KLHL13" "ARX"   
> 
> ## Get a mapping of the genes BCL2 and BCL2L11 to all of their
> ## transcript ids and return the result as list
> maps <- mapIds(edb, keys=c("BCL2", "BCL2L11"), column="TXID",
+                keytype="GENENAME", multiVals="list")
> maps
$BCL2
[1] "ENST00000398117" "ENST00000333681" "ENST00000590515" "ENST00000589955"
[5] "ENST00000444484"

$BCL2L11
 [1] "ENST00000432179" "ENST00000308659" "ENST00000393256" "ENST00000393252"
 [5] "ENST00000433098" "ENST00000405953" "ENST00000415458" "ENST00000436733"
 [9] "ENST00000437029" "ENST00000452231" "ENST00000361493" "ENST00000431217"
[13] "ENST00000439718" "ENST00000438054" "ENST00000357757" "ENST00000393253"
[17] "ENST00000337565"

> 
> ## Perform the same query using a combination of a GenenameFilter and a TxbiotypeFilter
> ## to just retrieve protein coding transcripts for these two genes.
> mapIds(edb, keys=list(GenenameFilter(c("BCL2", "BCL2L11")),
+                       TxbiotypeFilter("protein_coding")), column="TXID",
+        multiVals="list")
$BCL2L11
[1] "ENST00000432179" "ENST00000308659" "ENST00000393256" "ENST00000393252"
[5] "ENST00000405953" "ENST00000438054" "ENST00000357757" "ENST00000393253"
[9] "ENST00000337565"

$BCL2
[1] "ENST00000398117" "ENST00000333681" "ENST00000589955" "ENST00000444484"

Warning message:
In .mapIds(x = x, keys = keys, column = column, keytype = keytype,  :
  Got 2 filter objects. Will use the keys of the first for the mapping!
> 
> ## select:
> ## Retrieve all transcript and gene related information for the above example.
> select(edb, keys=list(GenenameFilter(c("BCL2", "BCL2L11")),
+                       TxbiotypeFilter("protein_coding")),
+        columns=c("GENEID", "GENENAME", "TXID", "TXBIOTYPE", "TXSEQSTART", "TXSEQEND",
+                  "SEQNAME", "SEQSTRAND"))
            GENEID GENENAME            TXID      TXBIOTYPE TXSEQSTART  TXSEQEND
1  ENSG00000153094  BCL2L11 ENST00000432179 protein_coding  111876955 111881689
2  ENSG00000153094  BCL2L11 ENST00000308659 protein_coding  111878491 111922625
3  ENSG00000153094  BCL2L11 ENST00000393256 protein_coding  111878506 111926024
4  ENSG00000153094  BCL2L11 ENST00000393252 protein_coding  111880247 111881537
5  ENSG00000153094  BCL2L11 ENST00000405953 protein_coding  111881323 111886414
6  ENSG00000153094  BCL2L11 ENST00000438054 protein_coding  111881329 111903861
7  ENSG00000153094  BCL2L11 ENST00000357757 protein_coding  111878491 111919016
8  ENSG00000153094  BCL2L11 ENST00000393253 protein_coding  111878491 111909428
9  ENSG00000153094  BCL2L11 ENST00000337565 protein_coding  111878491 111886423
10 ENSG00000171791     BCL2 ENST00000398117 protein_coding   60790579  60987361
11 ENSG00000171791     BCL2 ENST00000333681 protein_coding   60794268  60987019
12 ENSG00000171791     BCL2 ENST00000589955 protein_coding   60985135  60986045
13 ENSG00000171791     BCL2 ENST00000444484 protein_coding   60985187  60986613
   SEQNAME SEQSTRAND
1        2         1
2        2         1
3        2         1
4        2         1
5        2         1
6        2         1
7        2         1
8        2         1
9        2         1
10      18        -1
11      18        -1
12      18        -1
13      18        -1
> 
> ## Get all data for genes encoded on chromosome Y
> Y <- select(edb, keys="Y", keytype="SEQNAME")
> head(Y)
  ENTREZID          EXONID EXONIDX EXONSEQEND EXONSEQSTART    GENEBIOTYPE
1     8284 ENSE00001733393       1   21906809     21906557 protein_coding
2     8284 ENSE00000891759       2   21906439     21906271 protein_coding
3     8284 ENSE00000652508       3   21905125     21905048 protein_coding
4     8284 ENSE00000652506       4   21903743     21903621 protein_coding
5     8284 ENSE00001788914       5   21903374     21903204 protein_coding
6     8284 ENSE00001805865       6   21901548     21901414 protein_coding
           GENEID GENENAME GENESEQEND GENESEQSTART ISCIRCULAR SEQCOORDSYSTEM
1 ENSG00000012817    KDM5D   21906825     21865751          0     chromosome
2 ENSG00000012817    KDM5D   21906825     21865751          0     chromosome
3 ENSG00000012817    KDM5D   21906825     21865751          0     chromosome
4 ENSG00000012817    KDM5D   21906825     21865751          0     chromosome
5 ENSG00000012817    KDM5D   21906825     21865751          0     chromosome
6 ENSG00000012817    KDM5D   21906825     21865751          0     chromosome
  SEQLENGTH SEQNAME SEQSTRAND      TXBIOTYPE TXCDSSEQEND TXCDSSEQSTART
1  59373566       Y        -1 protein_coding    21906420      21867881
2  59373566       Y        -1 protein_coding    21906420      21867881
3  59373566       Y        -1 protein_coding    21906420      21867881
4  59373566       Y        -1 protein_coding    21906420      21867881
5  59373566       Y        -1 protein_coding    21906420      21867881
6  59373566       Y        -1 protein_coding    21906420      21867881
             TXID TXSEQEND TXSEQSTART
1 ENST00000317961 21906809   21867301
2 ENST00000317961 21906809   21867301
3 ENST00000317961 21906809   21867301
4 ENST00000317961 21906809   21867301
5 ENST00000317961 21906809   21867301
6 ENST00000317961 21906809   21867301
> nrow(Y)
[1] 3744
> 
> ## Get selected columns for all lincRNAs encoded on chromosome Y
> Y <- select(edb, keys=list(SeqnameFilter("Y"), GenebiotypeFilter("lincRNA")),
+             columns=c("GENEID", "GENEBIOTYPE", "TXID", "GENENAME"))
> head(Y)
           GENEID GENEBIOTYPE            TXID GENENAME
1 ENSG00000129816     lincRNA ENST00000250776   TTTY1B
2 ENSG00000129845     lincRNA ENST00000250805    TTTY1
3 ENSG00000131538     lincRNA ENST00000253838    TTTY6
4 ENSG00000131538     lincRNA ENST00000538537    TTTY6
5 ENSG00000147753     lincRNA ENST00000276770    TTTY7
6 ENSG00000147753     lincRNA ENST00000449828    TTTY7
> nrow(Y)
[1] 66
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>