R: Make a OrganismDb object from annotations available on a...
makeOrganismDbFromBiomart
R Documentation
Make a OrganismDb object from annotations available on a
BioMart database
Description
The makeOrganismDbFromBiomart function allows the user
to make a OrganismDb object from transcript annotations
available on a BioMart database. This object has all the benefits of
a TxDb, plus an associated OrgDb and GODb object.
which BioMart database to use.
Get the list of all available BioMart databases with the
listMarts function from the biomaRt
package. See the details section below for a list of BioMart
databases with compatible transcript annotations.
dataset
which dataset from BioMart. For example:
"hsapiens_gene_ensembl", "mmusculus_gene_ensembl",
"dmelanogaster_gene_ensembl", "celegans_gene_ensembl",
"scerevisiae_gene_ensembl", etc in the ensembl database.
See the examples section below for how to discover which datasets
are available in a given BioMart database.
transcript_ids
optionally, only retrieve transcript
annotation data for the specified set of transcript ids.
If this is used, then the meta information displayed for the
resulting TxDb object will say 'Full dataset: no'.
Otherwise it will say 'Full dataset: yes'. This TxDb object
will be embedded in the resulting OrganismDb object.
circ_seqs
a character vector to list out which chromosomes
should be marked as circular.
filter
Additional filters to use in the BioMart query. Must be
a named list. An example is filter=as.list(c(source="entrez"))
host
The host URL of the BioMart. Defaults to www.ensembl.org.
port
The port to use in the HTTP communication with the host.
id_prefix
Specifies the prefix used in BioMart attributes. For
example, some BioMarts may have an attribute specified as
"ensembl_transcript_id" whereas others have the same attribute
specified as "transcript_id". Defaults to "ensembl_".
miRBaseBuild
specify the string for the appropriate build
Information from mirbase.db to use for microRNAs. This can be
learned by calling supportedMiRBaseBuildValues. By default,
this value will be set to NA, which will inactivate the
microRNAs accessor.
keytype
This indicates the kind of key that this database will
use as a foreign key between it's TxDb object and it's OrgDb
object. So basically whatever the column name is for the foreign key
from your OrgDb that your TxDb will need to map it's GENEID on to.
By default it is "ENSEMBL" since the GENEID's for most biomaRt based
TxDbs will be ensembl gene ids and therefore they will need to map
to ENSEMBL gene mappings from the associated OrgDb object.
Details
makeOrganismDbFromBiomart is a convenience function that feeds
data from a BioMart database to the lower level
OrganismDb constructor.
See ?makeOrganismDbFromUCSC for a similar function
that feeds data from the UCSC source.
The listMarts function from the biomaRt package can be
used to list all public BioMart databases.
Not all databases returned by this function contain datasets that
are compatible with (i.e. understood by) makeOrganismDbFromBiomart.
Here is a list of datasets known to be compatible (updated on Sep 24, 2014):
All the datasets in the main Ensembl database:
use biomart="ensembl".
All the datasets in the Ensembl Fungi database:
use biomart="fungi_mart_XX" where XX is the release
version of the database e.g. "fungi_mart_22".
All the datasets in the Ensembl Metazoa database:
use biomart="metazoa_mart_XX" where XX is the release
version of the database e.g. "metazoa_mart_22".
All the datasets in the Ensembl Plants database:
use biomart="plants_mart_XX" where XX is the release
version of the database e.g. "plants_mart_22".
All the datasets in the Ensembl Protists database:
use biomart="protists_mart_XX" where XX is the release
version of the database e.g. "protists_mart_22".
All the datasets in the Gramene Mart:
use biomart="ENSEMBL_MART_PLANT".
Not all these datasets have CDS information.
Value
A OrganismDb object.
Author(s)
M. Carlson and H. Pages
See Also
makeOrganismDbFromUCSC for convenient ways to make a
OrganismDb object from UCSC online resources.
The listMarts, useMart,
and listDatasets functions in the
biomaRt package.
DEFAULT_CIRC_SEQS.
The supportedMiRBaseBuildValues function for
listing all the possible values for the miRBaseBuild
argument.
The OrganismDb class.
Examples
## Discover which datasets are available in the "ensembl" BioMart
## database:
library(biomaRt)
head(listDatasets(useMart("ensembl")))
## Retrieving an incomplete transcript dataset for Human from the
## "ensembl" BioMart database:
transcript_ids <- c(
"ENST00000013894",
"ENST00000268655",
"ENST00000313243",
"ENST00000435657",
"ENST00000384428",
"ENST00000478783"
)
odb <- makeOrganismDbFromBiomart(transcript_ids=transcript_ids)
odb # note that these annotations match the GRCh38 genome assembly
## Now what if we want to use another mirror? We might make use of the
## new host argument. But wait! If we use biomaRt, we can see that
## this host has named the mart differently!
listMarts(host="uswest.ensembl.org")
## Therefore we must also change the name passed into the "mart"
## argument thusly:
try(
odb <- makeOrganismDbFromBiomart(biomart="ENSEMBL_MART_ENSEMBL",
transcript_ids=transcript_ids,
host="uswest.ensembl.org")
)
odb
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(OrganismDbi)
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: AnnotationDbi
Loading required package: stats4
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: IRanges
Loading required package: S4Vectors
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: GenomicFeatures
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/OrganismDbi/makeOrganismDbFromBiomart.Rd_%03d_medium.png", width=480, height=480)
> ### Name: makeOrganismDbFromBiomart
> ### Title: Make a OrganismDb object from annotations available on a BioMart
> ### database
> ### Aliases: makeOrganismDbFromBiomart
>
> ### ** Examples
>
> ## Discover which datasets are available in the "ensembl" BioMart
> ## database:
> library(biomaRt)
> head(listDatasets(useMart("ensembl")))
dataset description
1 oanatinus_gene_ensembl Ornithorhynchus anatinus genes (OANA5)
2 cporcellus_gene_ensembl Cavia porcellus genes (cavPor3)
3 gaculeatus_gene_ensembl Gasterosteus aculeatus genes (BROADS1)
4 itridecemlineatus_gene_ensembl Ictidomys tridecemlineatus genes (spetri2)
5 lafricana_gene_ensembl Loxodonta africana genes (loxAfr3)
6 choffmanni_gene_ensembl Choloepus hoffmanni genes (choHof1)
version
1 OANA5
2 cavPor3
3 BROADS1
4 spetri2
5 loxAfr3
6 choHof1
>
> ## Retrieving an incomplete transcript dataset for Human from the
> ## "ensembl" BioMart database:
> transcript_ids <- c(
+ "ENST00000013894",
+ "ENST00000268655",
+ "ENST00000313243",
+ "ENST00000435657",
+ "ENST00000384428",
+ "ENST00000478783"
+ )
> odb <- makeOrganismDbFromBiomart(transcript_ids=transcript_ids)
Download and preprocess the 'transcripts' data frame ... OK
Download and preprocess the 'chrominfo' data frame ... OK
Download and preprocess the 'splicings' data frame ... OK
Download and preprocess the 'genes' data frame ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
> odb # note that these annotations match the GRCh38 genome assembly
OrganismDb Object:
# Includes GODb Object: GO.db
# With data about: Gene Ontology
# Includes OrgDb Object: org.Hs.eg.db
# Gene data about: Homo sapiens
# Taxonomy Id: 9606
# Includes TxDb Object: TxDb.Hsapiens.BioMart.ENSEMBLMARTENSEMBL.GRCh38.p5
# Transcriptome data about: Homo sapiens
# Based on genome:
# The OrgDb gene id ENSEMBL is mapped to the TxDb gene id GENEID .
>
> ## Now what if we want to use another mirror? We might make use of the
> ## new host argument. But wait! If we use biomaRt, we can see that
> ## this host has named the mart differently!
> listMarts(host="uswest.ensembl.org")
biomart version
1 ENSEMBL_MART_ENSEMBL Ensembl Genes 84
2 ENSEMBL_MART_SNP Ensembl Variation 84
3 ENSEMBL_MART_FUNCGEN Ensembl Regulation 84
4 ENSEMBL_MART_VEGA Vega 64
> ## Therefore we must also change the name passed into the "mart"
> ## argument thusly:
> try(
+ odb <- makeOrganismDbFromBiomart(biomart="ENSEMBL_MART_ENSEMBL",
+ transcript_ids=transcript_ids,
+ host="uswest.ensembl.org")
+ )
Download and preprocess the 'transcripts' data frame ... OK
Download and preprocess the 'chrominfo' data frame ... OK
Download and preprocess the 'splicings' data frame ... OK
Download and preprocess the 'genes' data frame ... OK
Prepare the 'metadata' data frame ... OK
Make the TxDb object ... OK
> odb
OrganismDb Object:
# Includes GODb Object: GO.db
# With data about: Gene Ontology
# Includes OrgDb Object: org.Hs.eg.db
# Gene data about: Homo sapiens
# Taxonomy Id: 9606
# Includes TxDb Object: TxDb.Hsapiens.BioMart.ENSEMBLMARTENSEMBL.GRCh38.p5
# Transcriptome data about: Homo sapiens
# Based on genome:
# The OrgDb gene id ENSEMBL is mapped to the TxDb gene id GENEID .
>
>
>
>
>
> dev.off()
null device
1
>