R: Basic usage of an Ensembl based annotation database
EnsDb-class
R Documentation
Basic usage of an Ensembl based annotation database
Description
Get some basic information from an Ensembl based annotation package
generated with makeEnsembldbPackage.
Usage
## S4 method for signature 'EnsDb'
buildQuery(x, columns=c("gene_id", "gene_biotype",
"gene_name"), filter=list(), order.by,
order.type="asc", skip.order.check=FALSE)
## S4 method for signature 'EnsDb'
dbconn(x)
EnsDb(x)
## S4 method for signature 'EnsDb'
ensemblVersion(x)
## S4 method for signature 'EnsDb'
listColumns(x, table, skip.keys=TRUE, ...)
## S4 method for signature 'EnsDb'
listGenebiotypes(x, ...)
## S4 method for signature 'EnsDb'
listTxbiotypes(x, ...)
## S4 method for signature 'EnsDb'
listTables(x, ...)
## S4 method for signature 'EnsDb'
metadata(x, ...)
## S4 method for signature 'EnsDb'
organism(object)
## S4 method for signature 'EnsDb'
seqinfo(x)
## S4 method for signature 'EnsDb'
seqlevels(x)
## S4 method for signature 'EnsDb'
updateEnsDb(x, ...)
Arguments
(in alphabetic order)
...
Additional arguments.
Not used.
columns
Columns (attributes) to be retrieved from the database tables. Use the
listColumns or listTables method for a list of
supported columns.
filter
list of BasicFilter instance(s) to
select specific entries from the database (see examples below).
object
For organism: an EnsDb instance.
order.by
name of one of the columns above on which the
results should be sorted.
order.type
if the results should be ordered ascending
(asc, default) or descending (desc).
skip.keys
for listColumns: whether primary and foreign keys (not
being e.g. "gene_id" or alike) should be returned or not. By
default these will not be returned.
skip.order.check
if paramter order.by should be checked for allowed column
names. If TRUE the function checks if the provided order
criteria orders on columns present in the database tables.
table
For listColumns: optionally specify the table name for
which the columns should be returned.
x
For EnsDb: the file name of the SQLite database.
Value
For buildQuery
A character string with the SQL query.
For connection
The SQL connection to the RSQLite database.
For EnsDb
An EnsDb instance.
For lengthOf
A named integer vector with the length of the genes or transcripts.
For listColumns
A character vector with the column names.
For listGenebiotypes
A character vector with the biotypes of the genes in the database.
For listTxbiotypes
A character vector with the biotypes of the transcripts in the database.
For listTables
A list with the names corresponding to the database table names
and the elements being the attribute (column) names of the table.
For metadata
A data.frame.
For organism
A character string.
For seqinfo
A Seqinfo class.
For updateEnsDb
A EnsDb object.
Objects from the Class
A connection to the respective annotation database is created upon
loading of an annotation package created with the
makeEnsembldbPackage function. In addition, the
EnsDb constructor specifying the SQLite database file can be
called to generate an instance of the object (see
makeEnsemblSQLiteFromTables for an example).
Slots
ensdb
Object of class "DBIConnection": the
connection to the database.
tables
Named list of database table columns with the names being the
database table names. The tables are ordered by their degree,
i.e. the number of other tables they can be joined with.
.properties
Internal list storing user-defined properties. Should not be
directly accessed.
Methods and Functions
buildQuery
Helper function building the SQL query to be used to retrieve the
wanted information. Usually there is no need to call this method.
dbconn
Returns the connection to the internal SQL database.
ensemblVersion
Returns the Ensembl version on which the package was built.
listColumns
Lists all columns of all tables in the database, or, if
table is specified, of the respective table.
listGenebiotypes
Lists all gene biotypes defined in the database.
listTxbiotypes
Lists all transcript biotypes defined in the database.
listTables
Returns a named list of database table columns (names of the
list being the database table names).
metadata
Returns a data.frame with the metadata information from the
database, i.e. informations about the Ensembl version or Genome
build the database was build upon.
organism
Returns the organism name (e.g. "homo_sapiens").
seqinfo
Returns the sequence/chromosome information from the database.
seqlevels
Returns the chromosome/sequence names that are available in the
database.
show
Displays some informations from the database.
updateEnsDb
Updates the EnsDb object to the most recent implementation.
library(EnsDb.Hsapiens.v75)
## Display some information:
EnsDb.Hsapiens.v75
## Show the tables along with its columns
listTables(EnsDb.Hsapiens.v75)
## For what species is this database?
organism(EnsDb.Hsapiens.v75)
## What Ensembl version if the database based on?
ensemblVersion(EnsDb.Hsapiens.v75)
## Get some more information from the database
metadata(EnsDb.Hsapiens.v75)
## Get all the sequence names.
seqlevels(EnsDb.Hsapiens.v75)
###### buildQuery
##
## Join tables gene and transcript and return gene_id and tx_id
buildQuery(EnsDb.Hsapiens.v75, columns=c("gene_id", "tx_id"))
## Get all exon_ids and transcript ids of genes encoded on chromosome Y.
buildQuery(EnsDb.Hsapiens.v75, columns=c("exon_id", "tx_id"),
filter=list(SeqnameFilter( "Y")))
## List all available gene biotypes from the database:
listGenebiotypes(EnsDb.Hsapiens.v75)
## List all available transcript biotypes:
listTxbiotypes(EnsDb.Hsapiens.v75)
## Update the EnsDb; this is in most instances not necessary at all.
updateEnsDb(EnsDb.Hsapiens.v75)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(ensembldb)
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: GenomicRanges
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicFeatures
Loading required package: AnnotationDbi
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/ensembldb/EnsDb-class.Rd_%03d_medium.png", width=480, height=480)
> ### Name: EnsDb-class
> ### Title: Basic usage of an Ensembl based annotation database
> ### Aliases: EnsDb-class EnsDb buildQuery buildQuery,EnsDb-method dbconn
> ### dbconn,EnsDb-method ensemblVersion ensemblVersion,EnsDb-method
> ### listColumns listColumns,EnsDb-method metadata metadata,EnsDb-method
> ### seqinfo seqinfo,EnsDb-method seqlevels seqlevels,EnsDb-method
> ### organism organism,EnsDb-method show show,EnsDb-method
> ### listGenebiotypes listGenebiotypes,EnsDb-method listTxbiotypes
> ### listTxbiotypes,EnsDb-method listTables listTables,EnsDb-method
> ### updateEnsDb updateEnsDb,EnsDb-method
> ### Keywords: classes
>
> ### ** Examples
>
>
> library(EnsDb.Hsapiens.v75)
>
> ## Display some information:
> EnsDb.Hsapiens.v75
EnsDb for Ensembl:
|Db type: EnsDb
|Type of Gene ID: Ensembl Gene ID
|Supporting package: ensembldb
|Db created by: ensembldb package from Bioconductor
|script_version: 0.1.2
|Creation time: Wed Mar 18 09:30:54 2015
|ensembl_version: 75
|ensembl_host: manny.i-med.ac.at
|Organism: homo_sapiens
|genome_build: GRCh37
|DBSCHEMAVERSION: 1.0
| No. of genes: 64102.
| No. of transcripts: 215647.
>
> ## Show the tables along with its columns
> listTables(EnsDb.Hsapiens.v75)
$gene
[1] "gene_id" "gene_name" "entrezid" "gene_biotype"
[5] "gene_seq_start" "gene_seq_end" "seq_name" "seq_strand"
[9] "seq_coord_system"
$tx
[1] "tx_id" "tx_biotype" "tx_seq_start" "tx_seq_end"
[5] "tx_cds_seq_start" "tx_cds_seq_end" "gene_id"
$tx2exon
[1] "tx_id" "exon_id" "exon_idx"
$exon
[1] "exon_id" "exon_seq_start" "exon_seq_end"
$chromosome
[1] "seq_name" "seq_length" "is_circular"
$metadata
[1] "name" "value"
>
> ## For what species is this database?
> organism(EnsDb.Hsapiens.v75)
[1] "Homo sapiens"
>
> ## What Ensembl version if the database based on?
> ensemblVersion(EnsDb.Hsapiens.v75)
[1] "75"
>
> ## Get some more information from the database
> metadata(EnsDb.Hsapiens.v75)
name value
1 Db type EnsDb
2 Type of Gene ID Ensembl Gene ID
3 Supporting package ensembldb
4 Db created by ensembldb package from Bioconductor
5 script_version 0.1.2
6 Creation time Wed Mar 18 09:30:54 2015
7 ensembl_version 75
8 ensembl_host manny.i-med.ac.at
9 Organism homo_sapiens
10 genome_build GRCh37
11 DBSCHEMAVERSION 1.0
>
> ## Get all the sequence names.
> seqlevels(EnsDb.Hsapiens.v75)
[1] "1" "10"
[3] "11" "12"
[5] "13" "14"
[7] "15" "16"
[9] "17" "18"
[11] "19" "2"
[13] "20" "21"
[15] "22" "3"
[17] "4" "5"
[19] "6" "7"
[21] "8" "9"
[23] "GL000191.1" "GL000192.1"
[25] "GL000193.1" "GL000194.1"
[27] "GL000195.1" "GL000196.1"
[29] "GL000199.1" "GL000201.1"
[31] "GL000204.1" "GL000205.1"
[33] "GL000209.1" "GL000211.1"
[35] "GL000212.1" "GL000213.1"
[37] "GL000215.1" "GL000216.1"
[39] "GL000218.1" "GL000219.1"
[41] "GL000220.1" "GL000221.1"
[43] "GL000222.1" "GL000223.1"
[45] "GL000224.1" "GL000225.1"
[47] "GL000228.1" "GL000229.1"
[49] "GL000230.1" "GL000231.1"
[51] "GL000233.1" "GL000236.1"
[53] "GL000237.1" "GL000240.1"
[55] "GL000241.1" "GL000242.1"
[57] "GL000243.1" "GL000247.1"
[59] "HG1007_PATCH" "HG1032_PATCH"
[61] "HG104_HG975_PATCH" "HG1063_PATCH"
[63] "HG1074_PATCH" "HG1079_PATCH"
[65] "HG1082_HG167_PATCH" "HG1091_PATCH"
[67] "HG1133_PATCH" "HG1146_PATCH"
[69] "HG115_PATCH" "HG1208_PATCH"
[71] "HG1211_PATCH" "HG122_PATCH"
[73] "HG1257_PATCH" "HG1287_PATCH"
[75] "HG1292_PATCH" "HG1293_PATCH"
[77] "HG1304_PATCH" "HG1308_PATCH"
[79] "HG1322_PATCH" "HG1350_HG959_PATCH"
[81] "HG1423_PATCH" "HG1424_PATCH"
[83] "HG1425_PATCH" "HG1426_PATCH"
[85] "HG142_HG150_NOVEL_TEST" "HG1433_PATCH"
[87] "HG1434_PATCH" "HG1435_PATCH"
[89] "HG1436_HG1432_PATCH" "HG1437_PATCH"
[91] "HG1438_PATCH" "HG1439_PATCH"
[93] "HG1440_PATCH" "HG1441_PATCH"
[95] "HG1442_PATCH" "HG1443_HG1444_PATCH"
[97] "HG144_PATCH" "HG1453_PATCH"
[99] "HG1458_PATCH" "HG1459_PATCH"
[101] "HG1462_PATCH" "HG1463_PATCH"
[103] "HG1472_PATCH" "HG1479_PATCH"
[105] "HG1486_PATCH" "HG1487_PATCH"
[107] "HG1488_PATCH" "HG1490_PATCH"
[109] "HG1497_PATCH" "HG14_PATCH"
[111] "HG1500_PATCH" "HG1501_PATCH"
[113] "HG1502_PATCH" "HG151_NOVEL_TEST"
[115] "HG1591_PATCH" "HG1592_PATCH"
[117] "HG1595_PATCH" "HG1699_PATCH"
[119] "HG174_HG254_PATCH" "HG183_PATCH"
[121] "HG185_PATCH" "HG186_PATCH"
[123] "HG193_PATCH" "HG19_PATCH"
[125] "HG237_PATCH" "HG243_PATCH"
[127] "HG256_PATCH" "HG271_PATCH"
[129] "HG27_PATCH" "HG280_PATCH"
[131] "HG281_PATCH" "HG299_PATCH"
[133] "HG29_PATCH" "HG305_PATCH"
[135] "HG306_PATCH" "HG311_PATCH"
[137] "HG325_PATCH" "HG329_PATCH"
[139] "HG339_PATCH" "HG344_PATCH"
[141] "HG348_PATCH" "HG357_PATCH"
[143] "HG375_PATCH" "HG385_PATCH"
[145] "HG388_HG400_PATCH" "HG414_PATCH"
[147] "HG417_PATCH" "HG418_PATCH"
[149] "HG444_PATCH" "HG480_HG481_PATCH"
[151] "HG497_PATCH" "HG506_HG507_HG1000_PATCH"
[153] "HG50_PATCH" "HG531_PATCH"
[155] "HG536_PATCH" "HG544_PATCH"
[157] "HG686_PATCH" "HG706_PATCH"
[159] "HG729_PATCH" "HG730_PATCH"
[161] "HG736_PATCH" "HG745_PATCH"
[163] "HG747_PATCH" "HG748_PATCH"
[165] "HG75_PATCH" "HG79_PATCH"
[167] "HG7_PATCH" "HG858_PATCH"
[169] "HG865_PATCH" "HG871_PATCH"
[171] "HG873_PATCH" "HG883_PATCH"
[173] "HG905_PATCH" "HG944_PATCH"
[175] "HG946_PATCH" "HG953_PATCH"
[177] "HG957_PATCH" "HG962_PATCH"
[179] "HG971_PATCH" "HG979_PATCH"
[181] "HG987_PATCH" "HG989_PATCH"
[183] "HG990_PATCH" "HG991_PATCH"
[185] "HG996_PATCH" "HG998_1_PATCH"
[187] "HG998_2_PATCH" "HG999_1_PATCH"
[189] "HG999_2_PATCH" "HSCHR10_1_CTG2"
[191] "HSCHR10_1_CTG5" "HSCHR12_1_CTG1"
[193] "HSCHR12_1_CTG2_1" "HSCHR12_1_CTG5"
[195] "HSCHR12_2_CTG2" "HSCHR12_2_CTG2_1"
[197] "HSCHR12_3_CTG2_1" "HSCHR15_1_CTG4"
[199] "HSCHR15_1_CTG8" "HSCHR16_1_CTG3_1"
[201] "HSCHR16_2_CTG3_1" "HSCHR17_1"
[203] "HSCHR17_1_CTG1" "HSCHR17_1_CTG4"
[205] "HSCHR17_2_CTG4" "HSCHR17_3_CTG4"
[207] "HSCHR17_4_CTG4" "HSCHR17_5_CTG4"
[209] "HSCHR17_6_CTG4" "HSCHR18_1_CTG1_1"
[211] "HSCHR18_1_CTG2_1" "HSCHR18_2_CTG2"
[213] "HSCHR18_2_CTG2_1" "HSCHR19LRC_COX1_CTG1"
[215] "HSCHR19LRC_COX2_CTG1" "HSCHR19LRC_LRC_I_CTG1"
[217] "HSCHR19LRC_LRC_J_CTG1" "HSCHR19LRC_LRC_S_CTG1"
[219] "HSCHR19LRC_LRC_T_CTG1" "HSCHR19LRC_PGF1_CTG1"
[221] "HSCHR19LRC_PGF2_CTG1" "HSCHR19_1_CTG3"
[223] "HSCHR19_1_CTG3_1" "HSCHR19_2_CTG3"
[225] "HSCHR19_3_CTG3" "HSCHR1_1_CTG31"
[227] "HSCHR1_2_CTG31" "HSCHR1_3_CTG31"
[229] "HSCHR20_1_CTG1" "HSCHR21_2_CTG1_1"
[231] "HSCHR21_3_CTG1_1" "HSCHR21_4_CTG1_1"
[233] "HSCHR22_1_CTG1" "HSCHR22_1_CTG2"
[235] "HSCHR22_2_CTG1" "HSCHR2_1_CTG1"
[237] "HSCHR2_1_CTG12" "HSCHR2_2_CTG12"
[239] "HSCHR3_1_CTG1" "HSCHR3_1_CTG2_1"
[241] "HSCHR4_1" "HSCHR4_1_CTG12"
[243] "HSCHR4_1_CTG6" "HSCHR4_2_CTG9"
[245] "HSCHR5_1_CTG1" "HSCHR5_1_CTG2"
[247] "HSCHR5_1_CTG5" "HSCHR5_2_CTG1"
[249] "HSCHR5_3_CTG1" "HSCHR6_1_CTG5"
[251] "HSCHR6_MHC_APD" "HSCHR6_MHC_COX"
[253] "HSCHR6_MHC_DBB" "HSCHR6_MHC_MANN"
[255] "HSCHR6_MHC_MCF" "HSCHR6_MHC_QBL"
[257] "HSCHR6_MHC_SSTO" "HSCHR7_1_CTG6"
[259] "HSCHR9_1_CTG1" "HSCHR9_1_CTG35"
[261] "HSCHR9_2_CTG35" "HSCHR9_3_CTG35"
[263] "LRG_123" "LRG_162"
[265] "LRG_183" "LRG_187"
[267] "LRG_239" "LRG_311"
[269] "LRG_415" "LRG_93"
[271] "MT" "X"
[273] "Y"
>
> ###### buildQuery
> ##
> ## Join tables gene and transcript and return gene_id and tx_id
> buildQuery(EnsDb.Hsapiens.v75, columns=c("gene_id", "tx_id"))
[1] "select distinct tx.tx_id,tx.gene_id from tx"
>
>
> ## Get all exon_ids and transcript ids of genes encoded on chromosome Y.
> buildQuery(EnsDb.Hsapiens.v75, columns=c("exon_id", "tx_id"),
+ filter=list(SeqnameFilter( "Y")))
[1] "select distinct gene.seq_name,tx2exon.tx_id,tx2exon.exon_id from gene join tx on (gene.gene_id=tx.gene_id) join tx2exon on (tx.tx_id=tx2exon.tx_id) where gene.seq_name = 'Y'"
>
> ## List all available gene biotypes from the database:
> listGenebiotypes(EnsDb.Hsapiens.v75)
[1] "protein_coding" "pseudogene"
[3] "processed_transcript" "antisense"
[5] "lincRNA" "polymorphic_pseudogene"
[7] "IG_V_pseudogene" "IG_V_gene"
[9] "sense_overlapping" "sense_intronic"
[11] "TR_V_gene" "misc_RNA"
[13] "snRNA" "miRNA"
[15] "snoRNA" "rRNA"
[17] "Mt_tRNA" "Mt_rRNA"
[19] "IG_C_gene" "IG_J_gene"
[21] "TR_J_gene" "TR_C_gene"
[23] "TR_V_pseudogene" "TR_J_pseudogene"
[25] "IG_D_gene" "IG_C_pseudogene"
[27] "TR_D_gene" "IG_J_pseudogene"
[29] "3prime_overlapping_ncrna" "processed_pseudogene"
[31] "LRG_gene"
>
> ## List all available transcript biotypes:
> listTxbiotypes(EnsDb.Hsapiens.v75)
[1] "protein_coding" "processed_transcript"
[3] "retained_intron" "nonsense_mediated_decay"
[5] "unitary_pseudogene" "non_stop_decay"
[7] "unprocessed_pseudogene" "processed_pseudogene"
[9] "transcribed_unprocessed_pseudogene" "antisense"
[11] "lincRNA" "polymorphic_pseudogene"
[13] "transcribed_processed_pseudogene" "miRNA"
[15] "pseudogene" "IG_V_pseudogene"
[17] "snoRNA" "IG_V_gene"
[19] "sense_overlapping" "sense_intronic"
[21] "TR_V_gene" "snRNA"
[23] "misc_RNA" "rRNA"
[25] "Mt_tRNA" "Mt_rRNA"
[27] "IG_C_gene" "IG_J_gene"
[29] "TR_J_gene" "TR_C_gene"
[31] "TR_V_pseudogene" "TR_J_pseudogene"
[33] "IG_D_gene" "IG_C_pseudogene"
[35] "TR_D_gene" "IG_J_pseudogene"
[37] "3prime_overlapping_ncrna" "translated_processed_pseudogene"
[39] "LRG_gene"
>
> ## Update the EnsDb; this is in most instances not necessary at all.
> updateEnsDb(EnsDb.Hsapiens.v75)
EnsDb for Ensembl:
|Db type: EnsDb
|Type of Gene ID: Ensembl Gene ID
|Supporting package: ensembldb
|Db created by: ensembldb package from Bioconductor
|script_version: 0.1.2
|Creation time: Wed Mar 18 09:30:54 2015
|ensembl_version: 75
|ensembl_host: manny.i-med.ac.at
|Organism: homo_sapiens
|genome_build: GRCh37
|DBSCHEMAVERSION: 1.0
| No. of genes: 64102.
| No. of transcripts: 215647.
>
>
>
>
>
>
> dev.off()
null device
1
>