Last data update: 2014.03.03

R: Basic usage of an Ensembl based annotation database
EnsDb-classR Documentation

Basic usage of an Ensembl based annotation database

Description

Get some basic information from an Ensembl based annotation package generated with makeEnsembldbPackage.

Usage


## S4 method for signature 'EnsDb'
buildQuery(x, columns=c("gene_id", "gene_biotype",
                                    "gene_name"), filter=list(), order.by,
                             order.type="asc", skip.order.check=FALSE)

## S4 method for signature 'EnsDb'
dbconn(x)

EnsDb(x)

## S4 method for signature 'EnsDb'
ensemblVersion(x)

## S4 method for signature 'EnsDb'
listColumns(x, table, skip.keys=TRUE, ...)

## S4 method for signature 'EnsDb'
listGenebiotypes(x, ...)

## S4 method for signature 'EnsDb'
listTxbiotypes(x, ...)

## S4 method for signature 'EnsDb'
listTables(x, ...)

## S4 method for signature 'EnsDb'
metadata(x, ...)

## S4 method for signature 'EnsDb'
organism(object)

## S4 method for signature 'EnsDb'
seqinfo(x)

## S4 method for signature 'EnsDb'
seqlevels(x)

## S4 method for signature 'EnsDb'
updateEnsDb(x, ...)

Arguments

(in alphabetic order)

...

Additional arguments. Not used.

columns

Columns (attributes) to be retrieved from the database tables. Use the listColumns or listTables method for a list of supported columns.

filter

list of BasicFilter instance(s) to select specific entries from the database (see examples below).

object

For organism: an EnsDb instance.

order.by

name of one of the columns above on which the results should be sorted.

order.type

if the results should be ordered ascending (asc, default) or descending (desc).

skip.keys

for listColumns: whether primary and foreign keys (not being e.g. "gene_id" or alike) should be returned or not. By default these will not be returned.

skip.order.check

if paramter order.by should be checked for allowed column names. If TRUE the function checks if the provided order criteria orders on columns present in the database tables.

table

For listColumns: optionally specify the table name for which the columns should be returned.

x

For EnsDb: the file name of the SQLite database.

Value

For buildQuery

A character string with the SQL query.

For connection

The SQL connection to the RSQLite database.

For EnsDb

An EnsDb instance.

For lengthOf

A named integer vector with the length of the genes or transcripts.

For listColumns

A character vector with the column names.

For listGenebiotypes

A character vector with the biotypes of the genes in the database.

For listTxbiotypes

A character vector with the biotypes of the transcripts in the database.

For listTables

A list with the names corresponding to the database table names and the elements being the attribute (column) names of the table.

For metadata

A data.frame.

For organism

A character string.

For seqinfo

A Seqinfo class.

For updateEnsDb

A EnsDb object.

Objects from the Class

A connection to the respective annotation database is created upon loading of an annotation package created with the makeEnsembldbPackage function. In addition, the EnsDb constructor specifying the SQLite database file can be called to generate an instance of the object (see makeEnsemblSQLiteFromTables for an example).

Slots

ensdb

Object of class "DBIConnection": the connection to the database.

tables

Named list of database table columns with the names being the database table names. The tables are ordered by their degree, i.e. the number of other tables they can be joined with.

.properties

Internal list storing user-defined properties. Should not be directly accessed.

Methods and Functions

buildQuery

Helper function building the SQL query to be used to retrieve the wanted information. Usually there is no need to call this method.

dbconn

Returns the connection to the internal SQL database.

ensemblVersion

Returns the Ensembl version on which the package was built.

listColumns

Lists all columns of all tables in the database, or, if table is specified, of the respective table.

listGenebiotypes

Lists all gene biotypes defined in the database.

listTxbiotypes

Lists all transcript biotypes defined in the database.

listTables

Returns a named list of database table columns (names of the list being the database table names).

metadata

Returns a data.frame with the metadata information from the database, i.e. informations about the Ensembl version or Genome build the database was build upon.

organism

Returns the organism name (e.g. "homo_sapiens").

seqinfo

Returns the sequence/chromosome information from the database.

seqlevels

Returns the chromosome/sequence names that are available in the database.

show

Displays some informations from the database.

updateEnsDb

Updates the EnsDb object to the most recent implementation.

Author(s)

Johannes Rainer

See Also

makeEnsembldbPackage, BasicFilter, exonsBy, genes, transcripts, makeEnsemblSQLiteFromTables

Examples


library(EnsDb.Hsapiens.v75)

## Display some information:
EnsDb.Hsapiens.v75

## Show the tables along with its columns
listTables(EnsDb.Hsapiens.v75)

## For what species is this database?
organism(EnsDb.Hsapiens.v75)

## What Ensembl version if the database based on?
ensemblVersion(EnsDb.Hsapiens.v75)

## Get some more information from the database
metadata(EnsDb.Hsapiens.v75)

## Get all the sequence names.
seqlevels(EnsDb.Hsapiens.v75)

######    buildQuery
##
## Join tables gene and transcript and return gene_id and tx_id
buildQuery(EnsDb.Hsapiens.v75, columns=c("gene_id", "tx_id"))


## Get all exon_ids and transcript ids of genes encoded on chromosome Y.
buildQuery(EnsDb.Hsapiens.v75, columns=c("exon_id", "tx_id"),
           filter=list(SeqnameFilter( "Y")))

## List all available gene biotypes from the database:
listGenebiotypes(EnsDb.Hsapiens.v75)

## List all available transcript biotypes:
listTxbiotypes(EnsDb.Hsapiens.v75)

## Update the EnsDb; this is in most instances not necessary at all.
updateEnsDb(EnsDb.Hsapiens.v75)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(ensembldb)
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: GenomicRanges
Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicFeatures
Loading required package: AnnotationDbi
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/ensembldb/EnsDb-class.Rd_%03d_medium.png", width=480, height=480)
> ### Name: EnsDb-class
> ### Title: Basic usage of an Ensembl based annotation database
> ### Aliases: EnsDb-class EnsDb buildQuery buildQuery,EnsDb-method dbconn
> ###   dbconn,EnsDb-method ensemblVersion ensemblVersion,EnsDb-method
> ###   listColumns listColumns,EnsDb-method metadata metadata,EnsDb-method
> ###   seqinfo seqinfo,EnsDb-method seqlevels seqlevels,EnsDb-method
> ###   organism organism,EnsDb-method show show,EnsDb-method
> ###   listGenebiotypes listGenebiotypes,EnsDb-method listTxbiotypes
> ###   listTxbiotypes,EnsDb-method listTables listTables,EnsDb-method
> ###   updateEnsDb updateEnsDb,EnsDb-method
> ### Keywords: classes
> 
> ### ** Examples
> 
> 
> library(EnsDb.Hsapiens.v75)
> 
> ## Display some information:
> EnsDb.Hsapiens.v75
EnsDb for Ensembl:
|Db type: EnsDb
|Type of Gene ID: Ensembl Gene ID
|Supporting package: ensembldb
|Db created by: ensembldb package from Bioconductor
|script_version: 0.1.2
|Creation time: Wed Mar 18 09:30:54 2015
|ensembl_version: 75
|ensembl_host: manny.i-med.ac.at
|Organism: homo_sapiens
|genome_build: GRCh37
|DBSCHEMAVERSION: 1.0
| No. of genes: 64102.
| No. of transcripts: 215647.
> 
> ## Show the tables along with its columns
> listTables(EnsDb.Hsapiens.v75)
$gene
[1] "gene_id"          "gene_name"        "entrezid"         "gene_biotype"    
[5] "gene_seq_start"   "gene_seq_end"     "seq_name"         "seq_strand"      
[9] "seq_coord_system"

$tx
[1] "tx_id"            "tx_biotype"       "tx_seq_start"     "tx_seq_end"      
[5] "tx_cds_seq_start" "tx_cds_seq_end"   "gene_id"         

$tx2exon
[1] "tx_id"    "exon_id"  "exon_idx"

$exon
[1] "exon_id"        "exon_seq_start" "exon_seq_end"  

$chromosome
[1] "seq_name"    "seq_length"  "is_circular"

$metadata
[1] "name"  "value"

> 
> ## For what species is this database?
> organism(EnsDb.Hsapiens.v75)
[1] "Homo sapiens"
> 
> ## What Ensembl version if the database based on?
> ensemblVersion(EnsDb.Hsapiens.v75)
[1] "75"
> 
> ## Get some more information from the database
> metadata(EnsDb.Hsapiens.v75)
                 name                               value
1             Db type                               EnsDb
2     Type of Gene ID                     Ensembl Gene ID
3  Supporting package                           ensembldb
4       Db created by ensembldb package from Bioconductor
5      script_version                               0.1.2
6       Creation time            Wed Mar 18 09:30:54 2015
7     ensembl_version                                  75
8        ensembl_host                   manny.i-med.ac.at
9            Organism                        homo_sapiens
10       genome_build                              GRCh37
11    DBSCHEMAVERSION                                 1.0
> 
> ## Get all the sequence names.
> seqlevels(EnsDb.Hsapiens.v75)
  [1] "1"                        "10"                      
  [3] "11"                       "12"                      
  [5] "13"                       "14"                      
  [7] "15"                       "16"                      
  [9] "17"                       "18"                      
 [11] "19"                       "2"                       
 [13] "20"                       "21"                      
 [15] "22"                       "3"                       
 [17] "4"                        "5"                       
 [19] "6"                        "7"                       
 [21] "8"                        "9"                       
 [23] "GL000191.1"               "GL000192.1"              
 [25] "GL000193.1"               "GL000194.1"              
 [27] "GL000195.1"               "GL000196.1"              
 [29] "GL000199.1"               "GL000201.1"              
 [31] "GL000204.1"               "GL000205.1"              
 [33] "GL000209.1"               "GL000211.1"              
 [35] "GL000212.1"               "GL000213.1"              
 [37] "GL000215.1"               "GL000216.1"              
 [39] "GL000218.1"               "GL000219.1"              
 [41] "GL000220.1"               "GL000221.1"              
 [43] "GL000222.1"               "GL000223.1"              
 [45] "GL000224.1"               "GL000225.1"              
 [47] "GL000228.1"               "GL000229.1"              
 [49] "GL000230.1"               "GL000231.1"              
 [51] "GL000233.1"               "GL000236.1"              
 [53] "GL000237.1"               "GL000240.1"              
 [55] "GL000241.1"               "GL000242.1"              
 [57] "GL000243.1"               "GL000247.1"              
 [59] "HG1007_PATCH"             "HG1032_PATCH"            
 [61] "HG104_HG975_PATCH"        "HG1063_PATCH"            
 [63] "HG1074_PATCH"             "HG1079_PATCH"            
 [65] "HG1082_HG167_PATCH"       "HG1091_PATCH"            
 [67] "HG1133_PATCH"             "HG1146_PATCH"            
 [69] "HG115_PATCH"              "HG1208_PATCH"            
 [71] "HG1211_PATCH"             "HG122_PATCH"             
 [73] "HG1257_PATCH"             "HG1287_PATCH"            
 [75] "HG1292_PATCH"             "HG1293_PATCH"            
 [77] "HG1304_PATCH"             "HG1308_PATCH"            
 [79] "HG1322_PATCH"             "HG1350_HG959_PATCH"      
 [81] "HG1423_PATCH"             "HG1424_PATCH"            
 [83] "HG1425_PATCH"             "HG1426_PATCH"            
 [85] "HG142_HG150_NOVEL_TEST"   "HG1433_PATCH"            
 [87] "HG1434_PATCH"             "HG1435_PATCH"            
 [89] "HG1436_HG1432_PATCH"      "HG1437_PATCH"            
 [91] "HG1438_PATCH"             "HG1439_PATCH"            
 [93] "HG1440_PATCH"             "HG1441_PATCH"            
 [95] "HG1442_PATCH"             "HG1443_HG1444_PATCH"     
 [97] "HG144_PATCH"              "HG1453_PATCH"            
 [99] "HG1458_PATCH"             "HG1459_PATCH"            
[101] "HG1462_PATCH"             "HG1463_PATCH"            
[103] "HG1472_PATCH"             "HG1479_PATCH"            
[105] "HG1486_PATCH"             "HG1487_PATCH"            
[107] "HG1488_PATCH"             "HG1490_PATCH"            
[109] "HG1497_PATCH"             "HG14_PATCH"              
[111] "HG1500_PATCH"             "HG1501_PATCH"            
[113] "HG1502_PATCH"             "HG151_NOVEL_TEST"        
[115] "HG1591_PATCH"             "HG1592_PATCH"            
[117] "HG1595_PATCH"             "HG1699_PATCH"            
[119] "HG174_HG254_PATCH"        "HG183_PATCH"             
[121] "HG185_PATCH"              "HG186_PATCH"             
[123] "HG193_PATCH"              "HG19_PATCH"              
[125] "HG237_PATCH"              "HG243_PATCH"             
[127] "HG256_PATCH"              "HG271_PATCH"             
[129] "HG27_PATCH"               "HG280_PATCH"             
[131] "HG281_PATCH"              "HG299_PATCH"             
[133] "HG29_PATCH"               "HG305_PATCH"             
[135] "HG306_PATCH"              "HG311_PATCH"             
[137] "HG325_PATCH"              "HG329_PATCH"             
[139] "HG339_PATCH"              "HG344_PATCH"             
[141] "HG348_PATCH"              "HG357_PATCH"             
[143] "HG375_PATCH"              "HG385_PATCH"             
[145] "HG388_HG400_PATCH"        "HG414_PATCH"             
[147] "HG417_PATCH"              "HG418_PATCH"             
[149] "HG444_PATCH"              "HG480_HG481_PATCH"       
[151] "HG497_PATCH"              "HG506_HG507_HG1000_PATCH"
[153] "HG50_PATCH"               "HG531_PATCH"             
[155] "HG536_PATCH"              "HG544_PATCH"             
[157] "HG686_PATCH"              "HG706_PATCH"             
[159] "HG729_PATCH"              "HG730_PATCH"             
[161] "HG736_PATCH"              "HG745_PATCH"             
[163] "HG747_PATCH"              "HG748_PATCH"             
[165] "HG75_PATCH"               "HG79_PATCH"              
[167] "HG7_PATCH"                "HG858_PATCH"             
[169] "HG865_PATCH"              "HG871_PATCH"             
[171] "HG873_PATCH"              "HG883_PATCH"             
[173] "HG905_PATCH"              "HG944_PATCH"             
[175] "HG946_PATCH"              "HG953_PATCH"             
[177] "HG957_PATCH"              "HG962_PATCH"             
[179] "HG971_PATCH"              "HG979_PATCH"             
[181] "HG987_PATCH"              "HG989_PATCH"             
[183] "HG990_PATCH"              "HG991_PATCH"             
[185] "HG996_PATCH"              "HG998_1_PATCH"           
[187] "HG998_2_PATCH"            "HG999_1_PATCH"           
[189] "HG999_2_PATCH"            "HSCHR10_1_CTG2"          
[191] "HSCHR10_1_CTG5"           "HSCHR12_1_CTG1"          
[193] "HSCHR12_1_CTG2_1"         "HSCHR12_1_CTG5"          
[195] "HSCHR12_2_CTG2"           "HSCHR12_2_CTG2_1"        
[197] "HSCHR12_3_CTG2_1"         "HSCHR15_1_CTG4"          
[199] "HSCHR15_1_CTG8"           "HSCHR16_1_CTG3_1"        
[201] "HSCHR16_2_CTG3_1"         "HSCHR17_1"               
[203] "HSCHR17_1_CTG1"           "HSCHR17_1_CTG4"          
[205] "HSCHR17_2_CTG4"           "HSCHR17_3_CTG4"          
[207] "HSCHR17_4_CTG4"           "HSCHR17_5_CTG4"          
[209] "HSCHR17_6_CTG4"           "HSCHR18_1_CTG1_1"        
[211] "HSCHR18_1_CTG2_1"         "HSCHR18_2_CTG2"          
[213] "HSCHR18_2_CTG2_1"         "HSCHR19LRC_COX1_CTG1"    
[215] "HSCHR19LRC_COX2_CTG1"     "HSCHR19LRC_LRC_I_CTG1"   
[217] "HSCHR19LRC_LRC_J_CTG1"    "HSCHR19LRC_LRC_S_CTG1"   
[219] "HSCHR19LRC_LRC_T_CTG1"    "HSCHR19LRC_PGF1_CTG1"    
[221] "HSCHR19LRC_PGF2_CTG1"     "HSCHR19_1_CTG3"          
[223] "HSCHR19_1_CTG3_1"         "HSCHR19_2_CTG3"          
[225] "HSCHR19_3_CTG3"           "HSCHR1_1_CTG31"          
[227] "HSCHR1_2_CTG31"           "HSCHR1_3_CTG31"          
[229] "HSCHR20_1_CTG1"           "HSCHR21_2_CTG1_1"        
[231] "HSCHR21_3_CTG1_1"         "HSCHR21_4_CTG1_1"        
[233] "HSCHR22_1_CTG1"           "HSCHR22_1_CTG2"          
[235] "HSCHR22_2_CTG1"           "HSCHR2_1_CTG1"           
[237] "HSCHR2_1_CTG12"           "HSCHR2_2_CTG12"          
[239] "HSCHR3_1_CTG1"            "HSCHR3_1_CTG2_1"         
[241] "HSCHR4_1"                 "HSCHR4_1_CTG12"          
[243] "HSCHR4_1_CTG6"            "HSCHR4_2_CTG9"           
[245] "HSCHR5_1_CTG1"            "HSCHR5_1_CTG2"           
[247] "HSCHR5_1_CTG5"            "HSCHR5_2_CTG1"           
[249] "HSCHR5_3_CTG1"            "HSCHR6_1_CTG5"           
[251] "HSCHR6_MHC_APD"           "HSCHR6_MHC_COX"          
[253] "HSCHR6_MHC_DBB"           "HSCHR6_MHC_MANN"         
[255] "HSCHR6_MHC_MCF"           "HSCHR6_MHC_QBL"          
[257] "HSCHR6_MHC_SSTO"          "HSCHR7_1_CTG6"           
[259] "HSCHR9_1_CTG1"            "HSCHR9_1_CTG35"          
[261] "HSCHR9_2_CTG35"           "HSCHR9_3_CTG35"          
[263] "LRG_123"                  "LRG_162"                 
[265] "LRG_183"                  "LRG_187"                 
[267] "LRG_239"                  "LRG_311"                 
[269] "LRG_415"                  "LRG_93"                  
[271] "MT"                       "X"                       
[273] "Y"                       
> 
> ######    buildQuery
> ##
> ## Join tables gene and transcript and return gene_id and tx_id
> buildQuery(EnsDb.Hsapiens.v75, columns=c("gene_id", "tx_id"))
[1] "select distinct tx.tx_id,tx.gene_id from tx"
> 
> 
> ## Get all exon_ids and transcript ids of genes encoded on chromosome Y.
> buildQuery(EnsDb.Hsapiens.v75, columns=c("exon_id", "tx_id"),
+            filter=list(SeqnameFilter( "Y")))
[1] "select distinct gene.seq_name,tx2exon.tx_id,tx2exon.exon_id from gene join tx on (gene.gene_id=tx.gene_id) join tx2exon on (tx.tx_id=tx2exon.tx_id) where gene.seq_name = 'Y'"
> 
> ## List all available gene biotypes from the database:
> listGenebiotypes(EnsDb.Hsapiens.v75)
 [1] "protein_coding"           "pseudogene"              
 [3] "processed_transcript"     "antisense"               
 [5] "lincRNA"                  "polymorphic_pseudogene"  
 [7] "IG_V_pseudogene"          "IG_V_gene"               
 [9] "sense_overlapping"        "sense_intronic"          
[11] "TR_V_gene"                "misc_RNA"                
[13] "snRNA"                    "miRNA"                   
[15] "snoRNA"                   "rRNA"                    
[17] "Mt_tRNA"                  "Mt_rRNA"                 
[19] "IG_C_gene"                "IG_J_gene"               
[21] "TR_J_gene"                "TR_C_gene"               
[23] "TR_V_pseudogene"          "TR_J_pseudogene"         
[25] "IG_D_gene"                "IG_C_pseudogene"         
[27] "TR_D_gene"                "IG_J_pseudogene"         
[29] "3prime_overlapping_ncrna" "processed_pseudogene"    
[31] "LRG_gene"                
> 
> ## List all available transcript biotypes:
> listTxbiotypes(EnsDb.Hsapiens.v75)
 [1] "protein_coding"                     "processed_transcript"              
 [3] "retained_intron"                    "nonsense_mediated_decay"           
 [5] "unitary_pseudogene"                 "non_stop_decay"                    
 [7] "unprocessed_pseudogene"             "processed_pseudogene"              
 [9] "transcribed_unprocessed_pseudogene" "antisense"                         
[11] "lincRNA"                            "polymorphic_pseudogene"            
[13] "transcribed_processed_pseudogene"   "miRNA"                             
[15] "pseudogene"                         "IG_V_pseudogene"                   
[17] "snoRNA"                             "IG_V_gene"                         
[19] "sense_overlapping"                  "sense_intronic"                    
[21] "TR_V_gene"                          "snRNA"                             
[23] "misc_RNA"                           "rRNA"                              
[25] "Mt_tRNA"                            "Mt_rRNA"                           
[27] "IG_C_gene"                          "IG_J_gene"                         
[29] "TR_J_gene"                          "TR_C_gene"                         
[31] "TR_V_pseudogene"                    "TR_J_pseudogene"                   
[33] "IG_D_gene"                          "IG_C_pseudogene"                   
[35] "TR_D_gene"                          "IG_J_pseudogene"                   
[37] "3prime_overlapping_ncrna"           "translated_processed_pseudogene"   
[39] "LRG_gene"                          
> 
> ## Update the EnsDb; this is in most instances not necessary at all.
> updateEnsDb(EnsDb.Hsapiens.v75)
EnsDb for Ensembl:
|Db type: EnsDb
|Type of Gene ID: Ensembl Gene ID
|Supporting package: ensembldb
|Db created by: ensembldb package from Bioconductor
|script_version: 0.1.2
|Creation time: Wed Mar 18 09:30:54 2015
|ensembl_version: 75
|ensembl_host: manny.i-med.ac.at
|Organism: homo_sapiens
|genome_build: GRCh37
|DBSCHEMAVERSION: 1.0
| No. of genes: 64102.
| No. of transcripts: 215647.
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>