Last data update: 2014.03.03

R: GenomicFiles objects
GenomicFilesR Documentation

GenomicFiles objects

Description

The GenomicFiles class is a matrix-like container where rows represent ranges of interest and columns represent files. The class is designed for byFile or byRange queries.

Constructor

GenomicFiles(rowRanges, files, colData=DataFrame(), metadata=list(), ...):

Details

GenomicFiles inherits from the RangedSummarizedExperiment class in the SummarizedExperiment package. Currently, no use is made of the elementMetadat and assays slots. This may change in the future.

Accessors

In the code below, x is a GenomicFiles object.

rowRanges, rowRanges(x) <- value

Get or set the rowRanges on x. value can be a GRanges or GRangesList representing ranges or indices defined on the spaces (position) of the files.

files(x), files(x) <- value

Get or set the files on x. value can be a character() of file paths or a List of file objects such as BamFile, BigWigFile, FaFile, etc.

colData, colData(x) <- value

Get or set the colData on x. value must be a DataFrame instance describing the files. The number of rows must match the number of files. Row names, if present, become the column names of the GenomicFiles.

metadata, metadata(x) <- value

Get or set the metadata on x. value must be a SimpleList of arbitrary content describing the overall experiment.

dimnames, dimnames(x) <- value

Get or set the row and column names on x.

Methods

In the code below, x is a GenomicFiles object.

[

Subset the object by fileRange or fileSample.

show

Compactly display the object.

reduceByFile

Extract, manipulate and combine data defined in rowRanges within the files specified in files. See ?reduceByFile for details.

reduceByRange

Extract, manipulate and combine data defined in rowRanges across the files specified in files. See ?reduceByRange for details.

Author(s)

Martin Morgan and Valerie Obenchain

See Also

  • reduceByFile and reduceByRange methods.

  • SummarizedExperiment objects in the SummarizedExperiment package.

Examples

## -----------------------------------------------------------------------
## Basic Use
## -----------------------------------------------------------------------

if (require(RNAseqData.HNRNPC.bam.chr14)) { 
  fl <- RNAseqData.HNRNPC.bam.chr14_BAMFILES
  rd <- GRanges("chr14", 
                 IRanges(c(62262735, 63121531, 63980327), width=214700))
  cd <- DataFrame(method=rep("RNASeq", length(fl)),
                  format=rep("bam", length(fl)))

  ## Construct an instance of the class:
  gf <- GenomicFiles(files = fl, rowRanges = rd, colData = cd) 
  gf

  ## Subset on ranges or files for different experimental runs.
  dim(gf)
  gf_sub <- gf[2, 3:4]
  dim(gf_sub)
 
  ## When summarize = TRUE and no REDUCE is provided the reduceBy* 
  ## functions output a SummarizedExperiment object.
  MAP <- function(range, file, ...) {
      requireNamespace("GenomicFiles", quietly=TRUE) ## for coverage()
      requireNamespace("Rsamtools", quietly=TRUE)     ## for ScanBamParam()
      param = Rsamtools::ScanBamParam(which=range)
      GenomicFiles::coverage(file, param=param)[range]
  } 
  se <- reduceByRange(gf, MAP=MAP, summarize=TRUE)
  se
 
  ## Data from the rowRanges, colData and metadata slots in the
  ## GenomicFiles are transferred to the SummarizedExperiment.
  colData(se)
 
  ## Results are in the assays slot.
  assays(se) 
}
 
## -----------------------------------------------------------------------
## Managing cached or remote files with GenomicFiles
## -----------------------------------------------------------------------

## The GenomicFiles class can manage cached or remote files and their 
## associated ranges.

## Not run: 
## Files from AnnotationHub can be downloaded and cached locally.
library(AnnotationHub)
hub = AnnotationHub()
hublet = query(hub, c("files I'm", "interested in"))
# cache (if need) and return local path to files
fls = cache(hublet)

## An alternative to the local file paths is to use urls to a remote file.
## This approach could be used with something like rtracklayer::bigWig which
## supports remote file queries.
urls = hublet$sourceurls

## Define ranges of interest and use GenomicFiles to manage.
rngs = GRanges("chr10", IRanges(c(100000, 200000), width=1))
gf = GenomicFiles(rngs, fls)

## As an example, one could create a matrix from data extracted
## across multiple BED files.
MAP = function(rng, fl) {
    requireNamespace("rtracklayer", quietly=TRUE)  ## import, BEDFile
    rtracklayer::import(rtracklayer::BEDFile(fl), which=rng)$name
}
REDUCE = unlist
xx = reduceFiles(gf, MAP=MAP, REDUCE=REDUCE)
mcols(rngs) = simplify2array(xx)

## Data and ranges can be stored in a SummarizedExperiment.
SummarizedExperiment(list(my=simplify2array(xx)), rowRanges=rngs)

## End(Not run)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(GenomicFiles)
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: GenomicRanges
Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: BiocParallel
Loading required package: Rsamtools
Loading required package: Biostrings
Loading required package: XVector
Loading required package: rtracklayer
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/GenomicFiles/GenomicFiles-class.Rd_%03d_medium.png", width=480, height=480)
> ### Name: GenomicFiles
> ### Title: GenomicFiles objects
> ### Aliases: GenomicFiles class:GenomicFiles GenomicFiles-class
> ###   GenomicFiles,GenomicRangesORGRangesList,character-method
> ###   GenomicFiles,GenomicRangesORGRangesList,List-method
> ###   GenomicFiles,GenomicRangesORGRangesList,list-method
> ###   GenomicFiles,missing,ANY-method GenomicFiles,missing,missing-method
> ###   files<- files,GenomicFiles-method
> ###   files<-,GenomicFiles,character-method
> ###   files<-,GenomicFiles,List-method dimnames<-,GenomicFiles,list-method
> ###   colData<-,GenomicFiles,DataFrame-method [,GenomicFiles,ANY,ANY-method
> ###   [,GenomicFiles,ANY,ANY,ANY-method show,GenomicFiles-method
> ### Keywords: classes methods
> 
> ### ** Examples
> 
> ## -----------------------------------------------------------------------
> ## Basic Use
> ## -----------------------------------------------------------------------
> 
> if (require(RNAseqData.HNRNPC.bam.chr14)) { 
+   fl <- RNAseqData.HNRNPC.bam.chr14_BAMFILES
+   rd <- GRanges("chr14", 
+                  IRanges(c(62262735, 63121531, 63980327), width=214700))
+   cd <- DataFrame(method=rep("RNASeq", length(fl)),
+                   format=rep("bam", length(fl)))
+ 
+   ## Construct an instance of the class:
+   gf <- GenomicFiles(files = fl, rowRanges = rd, colData = cd) 
+   gf
+ 
+   ## Subset on ranges or files for different experimental runs.
+   dim(gf)
+   gf_sub <- gf[2, 3:4]
+   dim(gf_sub)
+  
+   ## When summarize = TRUE and no REDUCE is provided the reduceBy* 
+   ## functions output a SummarizedExperiment object.
+   MAP <- function(range, file, ...) {
+       requireNamespace("GenomicFiles", quietly=TRUE) ## for coverage()
+       requireNamespace("Rsamtools", quietly=TRUE)     ## for ScanBamParam()
+       param = Rsamtools::ScanBamParam(which=range)
+       GenomicFiles::coverage(file, param=param)[range]
+   } 
+   se <- reduceByRange(gf, MAP=MAP, summarize=TRUE)
+   se
+  
+   ## Data from the rowRanges, colData and metadata slots in the
+   ## GenomicFiles are transferred to the SummarizedExperiment.
+   colData(se)
+  
+   ## Results are in the assays slot.
+   assays(se) 
+ }
Loading required package: RNAseqData.HNRNPC.bam.chr14
List of length 1
names(1): data
>  
> ## -----------------------------------------------------------------------
> ## Managing cached or remote files with GenomicFiles
> ## -----------------------------------------------------------------------
> 
> ## The GenomicFiles class can manage cached or remote files and their 
> ## associated ranges.
> 
> ## Not run: 
> ##D ## Files from AnnotationHub can be downloaded and cached locally.
> ##D library(AnnotationHub)
> ##D hub = AnnotationHub()
> ##D hublet = query(hub, c("files I'm", "interested in"))
> ##D # cache (if need) and return local path to files
> ##D fls = cache(hublet)
> ##D 
> ##D ## An alternative to the local file paths is to use urls to a remote file.
> ##D ## This approach could be used with something like rtracklayer::bigWig which
> ##D ## supports remote file queries.
> ##D urls = hublet$sourceurls
> ##D 
> ##D ## Define ranges of interest and use GenomicFiles to manage.
> ##D rngs = GRanges("chr10", IRanges(c(100000, 200000), width=1))
> ##D gf = GenomicFiles(rngs, fls)
> ##D 
> ##D ## As an example, one could create a matrix from data extracted
> ##D ## across multiple BED files.
> ##D MAP = function(rng, fl) {
> ##D     requireNamespace("rtracklayer", quietly=TRUE)  ## import, BEDFile
> ##D     rtracklayer::import(rtracklayer::BEDFile(fl), which=rng)$name
> ##D }
> ##D REDUCE = unlist
> ##D xx = reduceFiles(gf, MAP=MAP, REDUCE=REDUCE)
> ##D mcols(rngs) = simplify2array(xx)
> ##D 
> ##D ## Data and ranges can be stored in a SummarizedExperiment.
> ##D SummarizedExperiment(list(my=simplify2array(xx)), rowRanges=rngs)
> ## End(Not run)
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>