R: AnnotationHub objects and their related methods and functions
AnnotationHub-objects
R Documentation
AnnotationHub objects and their related methods and functions
Description
Use AnnotationHub to interact with Bioconductor's AnnotationHub
service. Query the instance to discover and use resources that are of
interest, and then easily download and import the resource into R for
immediate use.
Use AnnotationHub() to retrieve information about all records
in the hub.
Discover records in a hub using mcols(), query(),
subset(), [, and display().
Retrieve individual records using [[. On first use of a
resource, the corresponding files or other hub resources are
downloaded from the internet to a local cache. On this and all
subsequent uses the files are quickly input from the cache into the R
session.
AnnotationHub records can be added (and sometimes removed) at
any time. snapshotDate() restricts hub records to those
available at the time of the snapshot; use possibleDates() to
see possible snapshot dates.
The location of the local cache can be found (and updated) with
getAnnotationHubCache and setAnnotationHubCache;
removeCache removes all cache resources.
Create an AnnotationHub instance, possibly updating the
current database of records.
Accessors
In the code snippets below, x and object are
AnnotationHub objects.
hubCache(x):
Gets the file system location of the local AnnotationHub cache.
hubUrl(x):
Gets the URL for the online hub.
length(x):
Get the number of hub records.
names(x):
Get the names (AnnotationHub unique identifiers, of the form
AH12345) of the hub records.
fileName(x):
Get the file path of the hub records as stored in the local cache
(AnnotationHub files are stored as unique numbers, of the form
12345). NA is returned for those records which have not been
cached.
mcols(x):
Get the metadata columns describing each record. Columns include:
title
Record title, frequently the file name of the
object.
dataprovider
Original provider of the resource, e.g.,
Ensembl, UCSC.
species
The species for which the record is most
relevant, e.g., ‘Homo sapiens’.
taxonomyid
NCBI taxonomy identifier of the species.
genome
Genome build relevant to the record, e.g., hg19.
description
Textual description of the resource,
frequently automatically generated from file path and other
information available when the record was created.
tags
Single words added to the record to facilitate
identification, e.g,. TCGA, Roadmap.
rdataclass
The class of the R object used to represent
the object when imported into R, e.g., GRanges,
VCFFile.
sourceurl
Original URL of the resource.
sourectype
Format of the original resource, e.g., BED
file.
dbconn(x):
Return an open connection to the underyling SQLite database.
dbfile(x):
Return the full path the underyling SQLite database.
.db_close(conn):
Close the SQLite connection conn returned by dbconn(x).
Subsetting and related operations
In the code snippets below, x is an AnnotationHub object.
x$name:
Convenient reference to individual metadata columns, e.g.,
x$species.
x[i]:
Numerical, logical, or character vector (of AnnotationHub names)
to subset the hub, e.g., x[x$species == "Homo sapiens"].
x[[i]]:
Numerical or character scalar to retrieve (if necessary) and
import the resource into R.
query(x, pattern, ignore.case=TRUE, pattern.op= `&`):
Return an AnnotationHub subset containing only those elements
whose metadata matches pattern. Matching uses
pattern as in grepl to search the
as.character representation of each column, performing a
logical `&` across columns.
e.g., query(x, c("Homo sapiens", "hg19", "GTF")).
pattern
A character vector of patterns to search
(via grepl) for in any of the mcols() columns.
ignore.case
A logical(1) vector indicating whether
the search should ignore case (TRUE) or not (FALSE).
pattern.op
Any function of two arguments,
describing how matches across pattern elements are to be
combined. The default `&` requires that only records
with all elements of pattern in their metadata
columns are returned.
subset(x, subset):
Return the subset of records containing only those elements whose
metadata satisfies the expression in subset. The
expression can reference columns of mcols(x), and should
return a logical vector of length length(x).
e.g., subset(x, species == "Homo sapiens" &
genome="GRCh38").
display(object):
Open a web browser allowing for easy selection of hub records via
interactive tabular display. Return value is the subset of hub
records identified while navigating the display.
recordStatus(hub, record):
Returns a data.frame of the record id and status. hub must
be a Hub object and record must be a character(1).
Can be used to discover why a resource was removed from the hub.
Cache and hub management
In the code snippets below, x is an AnnotationHub object.
snapshotDate(x) and snapshotDate(x) <- value:
Gets or sets the date for the snapshot in use. value should
be one of possibleDates().
possibleDates(x):
Lists dates for snapshots that the hub could potentially use.
cache(x) and cache(x) <- NULL: Adds (downloads) all
resources in x, or removes all local resources
corresponding to the records in x from the cache. In this case,
x would typically be a small subset of AnnotationHub resources.
hubUrl(x):
Gets the URL for the online AnnotationHub.
hubCache(x):
Gets the file system location of the local AnnotationHub cache.
removeCache(x):
Removes local AnnotationHub database and all related resources. After
calling this function, the user will have to download any AnnotationHub
resources again.
getAnnotationHubOption():
TODO: Get cache options "CACHE", "URL", "MAXDOWNLOADS" ...
setAnnotationHubOption():
TODO: Set cache options "CACHE", "URL", "MAXDOWNLOADS" ...
Coercion
In the code snippets below, x is an AnnotationHub object.
as.list(x):
Coerce x to a list of hub instances, one entry per
element. Primarily for internal use.
c(x, ...):
Concatenate one or more sub-hub. Sub-hubs must reference the same
AnnotationHub instance. Duplicate entries are removed.
Author(s)
Martin Morgan, Marc Carlson, Sonali Arora, and Dan Tenenbaum
Examples
## create an AnnotationHub object
library(AnnotationHub)
ah = AnnotationHub()
## Summary of available records
ah
## Detail for a single record
ah[1]
## and what is the date we are using?
snapshotDate(ah)
## how many resources?
length(ah)
## from which resources, is data available?
head(sort(table(ah$dataprovider), decreasing=TRUE))
## from which species, is data available ?
head(sort(table(ah$species),decreasing=TRUE))
## what web service and local cache does this AnnotationHub point to?
hubUrl(ah)
hubCache(ah)
### Examples ###
## One can search the hub for multiple strings
ahs2 <- query(ah, c("GTF", "77","Ensembl", "Homo sapiens"))
## information about the file can be retrieved using
ahs2[1]
## one can further extract information from this show method
## like the sourceurl using:
ahs2$sourceurl
ahs2$description
ahs2$title
## We can download a file by name like this (using a list semantic):
gr <- ahs2[[1]]
## And we can also extract it by the names like this:
res <- ah[["AH28812"]]
## the gtf file is returned as a GenomicRanges object and contains
## data about which organism it belongs to, its seqlevels and seqlengths
seqinfo(gr)
## each GenomicRanges contains a metadata slot which can be used to get
## the name of the hub object and other associated metadata.
metadata(gr)
ah[metadata(gr)$AnnotationHubName]
## And we can also use "[" to restrict the things that are in the
## AnnotationHub object (by position, character, or logical vector).
## Here is a demo of position:
subHub <- ah[1:3]
if(interactive()) {
## Display method involves user interaction through web interface
ah2 <- display(ah)
}
## recordStatus
recordStatus(ah, "TEST")
recordStatus(ah, "AH7220")
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(AnnotationHub)
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/AnnotationHub/AnnotationHub-class.Rd_%03d_medium.png", width=480, height=480)
> ### Name: AnnotationHub-objects
> ### Title: AnnotationHub objects and their related methods and functions
> ### Aliases: class:AnnotationHub AnnotationHub-class class:Hub Hub-class
> ### .Hub AnnotationHub mcols,Hub-method cache cache,Hub-method
> ### cache,AnnotationHub-method cache<- cache<-,Hub-method hubUrl
> ### hubUrl,Hub-method hubCache hubCache,Hub-method hubDate
> ### hubDate,Hub-method package package,Hub-method removeCache
> ### possibleDates snapshotDate snapshotDate,Hub-method snapshotDate<-
> ### snapshotDate<-,Hub-method dbconn,Hub-method dbfile,Hub-method
> ### .db_close recordStatus recordStatus,Hub-method length,Hub-method
> ### names,Hub-method fileName,Hub-method $,Hub-method
> ### [[,Hub,character,missing-method [[,Hub,numeric,missing-method
> ### [,Hub,character,missing-method [,Hub,logical,missing-method
> ### [,Hub,numeric,missing-method [<-,Hub,character,missing,Hub-method
> ### [<-,Hub,logical,missing,Hub-method [<-,Hub,numeric,missing,Hub-method
> ### subset,Hub-method query query,Hub-method display display,Hub-method
> ### as.list.Hub as.list,Hub-method c,Hub-method show,Hub-method
> ### show,AnnotationHub-method show,AnnotationHubResource-method
> ### Keywords: classes methods
>
> ### ** Examples
>
> ## create an AnnotationHub object
> library(AnnotationHub)
> ah = AnnotationHub()
updating metadata: retrieving 1 resource
snapshotDate(): 2016-06-06
>
> ## Summary of available records
> ah
AnnotationHub with 43720 records
# snapshotDate(): 2016-06-06
# $dataprovider: BroadInstitute, UCSC, Ensembl, EncodeDCC, NCBI, ftp://ftp.n...
# $species: Homo sapiens, Mus musculus, Bos taurus, Pan troglodytes, Danio r...
# $rdataclass: GRanges, BigWigFile, FaFile, OrgDb, TwoBitFile, ChainFile, In...
# additional mcols(): taxonomyid, genome, description, tags, sourceurl,
# sourcetype
# retrieve records with, e.g., 'object[["AH2"]]'
title
AH2 | Ailuropoda_melanoleuca.ailMel1.69.dna.toplevel.fa
AH3 | Ailuropoda_melanoleuca.ailMel1.69.dna_rm.toplevel.fa
AH4 | Ailuropoda_melanoleuca.ailMel1.69.dna_sm.toplevel.fa
AH5 | Ailuropoda_melanoleuca.ailMel1.69.ncrna.fa
AH6 | Ailuropoda_melanoleuca.ailMel1.69.pep.all.fa
... ...
AH50771 | Xiphophorus_maculatus.Xipmac4.4.2.dna.toplevel.2bit
AH50772 | Xiphophorus_maculatus.Xipmac4.4.2.ncrna.2bit
AH50773 | Vvinifera_CRIBI_IGGP12Xv0_V2.1.gff3.Rdata
AH50774 | Vvinifera_Genoscope_IGGP12Xv0_V1.0.gff3.Rdata
AH50775 | Vvinifera_Genoscope_IGGP8X_V1.0.gff3.Rdata
>
> ## Detail for a single record
> ah[1]
AnnotationHub with 1 record
# snapshotDate(): 2016-06-06
# names(): AH2
# $dataprovider: Ensembl
# $species: Ailuropoda melanoleuca
# $rdataclass: FaFile
# $title: Ailuropoda_melanoleuca.ailMel1.69.dna.toplevel.fa
# $description: FASTA DNA sequence for Ailuropoda melanoleuca
# $taxonomyid: 9646
# $genome: ailMel1
# $sourcetype: FASTA
# $sourceurl: ftp://ftp.ensembl.org/pub/release-69/fasta/ailuropoda_melanole...
# $sourcelastmodifieddate: 2012-10-12
# $sourcesize: 693412448
# $tags: FASTA, ensembl, sequence
# retrieve record with 'object[["AH2"]]'
>
> ## and what is the date we are using?
> snapshotDate(ah)
[1] "2016-06-06"
>
> ## how many resources?
> length(ah)
[1] 43720
>
> ## from which resources, is data available?
> head(sort(table(ah$dataprovider), decreasing=TRUE))
BroadInstitute UCSC
18248 8746
Ensembl EncodeDCC
7137 6058
NCBI ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
1145 1019
>
> ## from which species, is data available ?
> head(sort(table(ah$species),decreasing=TRUE))
Homo sapiens Mus musculus Bos taurus Pan troglodytes
30411 1398 262 245
Danio rerio Rattus norvegicus
228 219
>
> ## what web service and local cache does this AnnotationHub point to?
> hubUrl(ah)
[1] "https://annotationhub.bioconductor.org"
> hubCache(ah)
[1] "/home/ddbj/.AnnotationHub"
>
> ### Examples ###
>
> ## One can search the hub for multiple strings
> ahs2 <- query(ah, c("GTF", "77","Ensembl", "Homo sapiens"))
>
> ## information about the file can be retrieved using
> ahs2[1]
AnnotationHub with 1 record
# snapshotDate(): 2016-06-06
# names(): AH28812
# $dataprovider: Ensembl
# $species: Homo sapiens
# $rdataclass: GRanges
# $title: Homo_sapiens.GRCh38.77.gtf
# $description: Gene Annotation for Homo sapiens
# $taxonomyid: 9606
# $genome: GRCh38
# $sourcetype: GTF
# $sourceurl: ftp://ftp.ensembl.org/pub/release-77/gtf/homo_sapiens/Homo_sap...
# $sourcelastmodifieddate: NA
# $sourcesize: 44454526
# $tags: GTF, ensembl, Gene, Transcript, Annotation
# retrieve record with 'object[["AH28812"]]'
>
> ## one can further extract information from this show method
> ## like the sourceurl using:
> ahs2$sourceurl
[1] "ftp://ftp.ensembl.org/pub/release-77/gtf/homo_sapiens/Homo_sapiens.GRCh38.77.gtf.gz"
> ahs2$description
[1] "Gene Annotation for Homo sapiens"
> ahs2$title
[1] "Homo_sapiens.GRCh38.77.gtf"
>
> ## We can download a file by name like this (using a list semantic):
> gr <- ahs2[[1]]
require("GenomicRanges")
downloading from 'https://annotationhub.bioconductor.org/fetch/34252'
retrieving 1 resource
using guess work to populate seqinfo
> ## And we can also extract it by the names like this:
> res <- ah[["AH28812"]]
loading from cache '/home/ddbj/.AnnotationHub/34252'
using guess work to populate seqinfo
>
> ## the gtf file is returned as a GenomicRanges object and contains
> ## data about which organism it belongs to, its seqlevels and seqlengths
> seqinfo(gr)
Seqinfo object with 270 sequences (1 circular) from GRCh38 genome:
seqnames seqlengths isCircular genome
CHR_HG142_HG150_NOVEL_TEST 135094228 FALSE GRCh38
CHR_HG151_NOVEL_TEST 135097696 FALSE GRCh38
CHR_HSCHR10_1_CTG1 133844722 FALSE GRCh38
CHR_HSCHR10_1_CTG2 133813485 FALSE GRCh38
CHR_HSCHR10_1_CTG4 133815819 FALSE GRCh38
... ... ... ...
KI270741.1 157432 FALSE GRCh38
KI270743.1 210658 FALSE GRCh38
KI270744.1 168472 FALSE GRCh38
KI270750.1 148850 FALSE GRCh38
KI270752.1 27745 FALSE GRCh38
>
> ## each GenomicRanges contains a metadata slot which can be used to get
> ## the name of the hub object and other associated metadata.
> metadata(gr)
$AnnotationHubName
[1] "AH28812"
$`File Name`
[1] "Homo_sapiens.GRCh38.77.gtf.gz"
$`Data Source`
[1] "ftp://ftp.ensembl.org/pub/release-77/gtf/homo_sapiens/Homo_sapiens.GRCh38.77.gtf.gz"
$Provider
[1] "Ensembl"
$Organism
[1] "Homo sapiens"
$`Taxonomy ID`
[1] 9606
> ah[metadata(gr)$AnnotationHubName]
AnnotationHub with 1 record
# snapshotDate(): 2016-06-06
# names(): AH28812
# $dataprovider: Ensembl
# $species: Homo sapiens
# $rdataclass: GRanges
# $title: Homo_sapiens.GRCh38.77.gtf
# $description: Gene Annotation for Homo sapiens
# $taxonomyid: 9606
# $genome: GRCh38
# $sourcetype: GTF
# $sourceurl: ftp://ftp.ensembl.org/pub/release-77/gtf/homo_sapiens/Homo_sap...
# $sourcelastmodifieddate: NA
# $sourcesize: 44454526
# $tags: GTF, ensembl, Gene, Transcript, Annotation
# retrieve record with 'object[["AH28812"]]'
>
> ## And we can also use "[" to restrict the things that are in the
> ## AnnotationHub object (by position, character, or logical vector).
> ## Here is a demo of position:
> subHub <- ah[1:3]
>
> # if(interactive()) {
> ## Display method involves user interaction through web interface
> ah2 <- display(ah)
Loading required package: shiny
Listening on http://127.0.0.1:3099