R: ExperimentHub objects and their related methods and functions
ExperimentHub-objects
R Documentation
ExperimentHub objects and their related methods and functions
Description
## TODO: Explain differences between AnnotationHub and ExperimentHub
Use ExperimentHub to interact with Bioconductor's ExperimentHub
service. Query the instance to discover and use resources that are of
interest, and then easily download and import the resource into R for
immediate use.
Use ExperimentHub() to retrieve information about all records
in the hub.
Discover records in a hub using mcols(), query(),
subset(), [, and display().
Retrieve individual records using [[. On first use of a
resource, the corresponding files or other hub resources are
downloaded from the internet to a local cache. On this and all
subsequent uses the files are quickly input from the cache into the R
session.
ExperimentHub records can be added (and sometimes removed) at
any time. snapshotDate() restricts hub records to those
available at the time of the snapshot; use possibleDates() to
see possible snapshot dates.
The location of the local cache can be found (and updated) with
getExperimentHubCache and setExperimentHubCache;
removeCache removes all cache resources.
Create an ExperimentHub instance, possibly updating the
current database of records.
Accessors
In the code snippets below, x and object are
ExperimentHub objects.
hubCache(x):
Gets the file system location of the local ExperimentHub cache.
hubUrl(x):
Gets the URL for the online hub.
length(x):
Get the number of hub records.
names(x):
Get the names (ExperimentHub unique identifiers, of the form
AH12345) of the hub records.
fileName(x):
Get the file path of the hub records as stored in the local cache
(ExperimentHub files are stored as unique numbers, of the form
12345). NA is returned for those records which have not been
cached.
package(x):
Get the package name associated with the hub resource.
mcols(x):
Get the metadata columns describing each record. Columns include:
title
Record title, frequently the file name of the
object.
dataprovider
Original provider of the resource, e.g.,
Ensembl, UCSC.
species
The species for which the record is most
relevant, e.g., ‘Homo sapiens’.
taxonomyid
NCBI taxonomy identifier of the species.
genome
Genome build relevant to the record, e.g., hg19.
description
Textual description of the resource,
frequently automatically generated from file path and other
information available when the record was created.
tags
Single words added to the record to facilitate
identification, e.g,. TCGA, Roadmap.
rdataclass
The class of the R object used to represent
the object when imported into R, e.g., GRanges,
VCFFile.
sourceurl
Original URL of the resource.
sourectype
Format of the original resource, e.g., BED
file.
Subsetting and related operations
In the code snippets below, x is an ExperimentHub object.
x$name:
Convenient reference to individual metadata columns, e.g.,
x$species.
x[i]:
Numerical, logical, or character vector (of ExperimentHub names)
to subset the hub, e.g., x[x$species == "Homo sapiens"].
x[[i]]:
Numerical or character scalar to retrieve (if necessary) and
import the resource into R.
query(x, pattern, ignore.case=TRUE, pattern.op= `&`):
Return an ExperimentHub subset containing only those elements
whose metadata matches pattern. Matching uses
pattern as in grepl to search the
as.character representation of each column, performing a
logical `&` across columns.
e.g., query(x, c("Homo sapiens", "hg19", "GTF")).
pattern
A character vector of patterns to search
(via grepl) for in any of the mcols() columns.
ignore.case
A logical(1) vector indicating whether
the search should ignore case (TRUE) or not (FALSE).
pattern.op
Any function of two arguments,
describing how matches across pattern elements are to be
combined. The default `&` requires that only records
with all elements of pattern in their metadata
columns are returned.
subset(x, subset):
Return the subset of records containing only those elements whose
metadata satisfies the expression in subset. The
expression can reference columns of mcols(x), and should
return a logical vector of length length(x).
e.g., subset(x, species == "Homo sapiens" &
genome="GRCh38").
display(object):
Open a web browser allowing for easy selection of hub records via
interactive tabular display. Return value is the subset of hub
records identified while navigating the display.
Cache and hub management
In the code snippets below, x is an ExperimentHub object.
snapshotDate(x) and snapshotDate(x) <- value:
Gets or sets the date for the snapshot in use. value should
be one of possibleDates().
possibleDates(x):
Lists dates for snapshots that the hub could potentially use.
cache(x) and cache(x) <- NULL: Adds (downloads) all
resources in x, or removes all local resources
corresponding to the records in x from the cache. In this case,
x would typically be a small subset of ExperimentHub resources.
hubUrl(x):
Gets the URL for the online ExperimentHub.
hubCache(x):
Gets the file system location of the local ExperimentHub cache.
removeCache(x):
Removes local ExperimentHub database and all related resources. After
calling this function, the user will have to download any ExperimentHub
resources again.
getExperimentHubOption():
TODO: Get cache options "CACHE", "URL", "MAXDOWNLOADS" ...
setExperimentHubOption():
TODO: Set cache options "CACHE", "URL", "MAXDOWNLOADS" ...
Coercion
In the code snippets below, x is an ExperimentHub object.
as.list(x):
Coerce x to a list of hub instances, one entry per
element. Primarily for internal use.
c(x, ...):
Concatenate one or more sub-hub. Sub-hubs must reference the same
ExperimentHub instance. Duplicate entries are removed.