A GRangesList implies a grouping of the ranges; MAP
is applied to each element of the GRangesList vs each range
when ranges is a GRanges.
When ranges is a GenomicFiles the files
argument is missing; both ranges and files are extracted
from the object.
files
A character vector or List of filenames. A List
implies a grouping of the files; MAP is applied to each
element of the List vs each file individually.
MAP
A function executed on each worker. The signature must contain a
minimum of two arguments representing the ranges and files. There is no
restriction on argument names and additional arguments can be provided.
MAP = function(range, file, ...)
REDUCE
An optional function that combines output from the MAP step. The
signature must contain at least one argument representing the list
output from MAP. There is no restriction on argument names and
additional arguments can be provided.
REDUCE = function(mapped, ...)
Reduction combines data from a single worker and is always
performed as part of the distributed step. When iterate=TRUEREDUCE is applied after each MAP step;
depending on the nature of REDUCE, iterative reduction
can substantially decrease the data stored in memory. When
iterate=FALSE reduction is applied to the list of MAP
output applied to all files / ranges.
When REDUCE is missing, output is a list from MAP.
iterate
A logical indicating if the REDUCE function
should be applied iteratively to the output of
MAP. When REDUCE is missing iterate
is set to FALSE. This argument applies to reduceByRange only
(reduceRanges calls MAP a single time on each worker).
Collapsing results iteratively is useful when the number of
records to be processed is large (maybe complete files) but
the end result is a much reduced representation of all records.
Iteratively applying REDUCE reduces the amount of
data on each worker at any one time and can substantially
reduce the memory footprint.
summarize
A logical indicating if results should be returned as a
SummarizedExperiment object instead of a list;
data are returned in the assays slot named 'data'.
This argument applies to reduceByRange only.
When REDUCE is provided summarize is ignored
(i.e., set to FALSE). A SummarizedExperiment requires the number
of rows in colData and the columns in assays to match.
Because REDUCE collapses the data across files, the dimension of
the result no longer matches that of the original ranges.
init
An optional initial value for REDUCE when
iterate=TRUE. init must be an object of the same
type as the elements returned from MAP. REDUCE
logically adds init to the start (when proceeding left
to right) or end of results obtained with MAP.
...
Arguments passed to other methods. Currently not used.
Details
reduceByRange extracts, manipulates and combines ranges across
different files. Each element of ranges is sent to a worker;
this is a single range when ranges is a GRanges and may be
multiple ranges when ranges is a GRangesList. MAP is
invoked on each range / file combination. This approach allows ranges
extracted from multiple files to be kept separate or combined with
REDUCE.
In contrast, reduceRanges does not iterate through the individual
files but instead treats them as a group. MAP is invoked once for
each range using all files as the files argument. Depending on
the nature of MAP, REDUCE may play a minor role by
concatenating or unlisting results.
Both MAP and REDUCE are applied in the distributed
step (“on the worker“). There is no built-in ability to combine
results across workers in the distributed step.
Value
reduceByRange:
When summarize=FALSE the return value is a list or
the value from the final invocation of REDUCE. When
summarize=TRUE output is a SummarizedExperiment.
When ranges is a GenomicFiles object data from
rowRanges, colData and metadata are transferred
to the SummarizedExperiment.
reduceRanges:
A list or the value returned by the final invocation of
REDUCE.
Author(s)
Martin Morgan and Valerie Obenchain
See Also
reduceFiles
reduceByFile
GenomicFiles-class
Examples
if (all(requireNamespace("RNAseqData.HNRNPC.bam.chr14", quietly=TRUE) &&
require(GenomicAlignments))) {
## -----------------------------------------------------------------------
## Compute coverage across BAM files.
## -----------------------------------------------------------------------
fls <- ## 8 bam files
RNAseqData.HNRNPC.bam.chr14::RNAseqData.HNRNPC.bam.chr14_BAMFILES
## Regions of interest.
gr <- GRanges("chr14", IRanges(c(62262735, 63121531, 63980327),
width=214700))
## The MAP computes the coverage ...
MAP <- function(range, file, ...) {
requireNamespace("GenomicFiles", quietly=TRUE)
## for coverage(), Rsamtools::ScanBamParam()
param = Rsamtools::ScanBamParam(which=range)
GenomicFiles::coverage(file, param=param)[range]
}
## and REDUCE adds the last and current results.
REDUCE <- function(mapped, ...)
Reduce("+", mapped)
## -----------------------------------------------------------------------
## reduceByRange:
## With no REDUCE, coverage is computed for each range / file combination.
cov1 <- reduceByRange(gr, fls, MAP)
cov1[[1]]
## Each call to coverage() produces an RleList which accumulate on the
## workers. We can use a reducer to combine these lists either iteratively
## or non-iteratively. When iterate = TRUE the current result
## is collapsed with the last resulting in a maximum of 2 RleLists on
## a worker at any given time.
cov2 <- reduceByRange(gr, fls, MAP, REDUCE, iterate=TRUE)
cov2[[1]]
## If memory use is not a concern (or if MAP output is not large) the
## REDUCE function can be applied non-iteratively.
cov3 <- reduceByRange(gr, fls, MAP, REDUCE, iterate=FALSE)
## Results match those obtained with the iterative REDUCE.
cov3[[1]]
## When 'ranges' is a GRangesList, the list elements are sent to the
## workers instead of a single range as in the case of a GRanges.
grl <- GRangesList(gr[1], gr[2:3])
grl
cov4 <- reduceByRange(grl, fls, MAP)
length(cov4) ## length of GRangesList
elementNROWS(cov4) ## number of files
## -----------------------------------------------------------------------
## reduceRanges:
## This function passes the character vector of all file names to MAP.
## MAP must handle each file separately or invoke a method that operates
## on a list of files.
## TODO: example
}
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(GenomicFiles)
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: GenomicRanges
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: BiocParallel
Loading required package: Rsamtools
Loading required package: Biostrings
Loading required package: XVector
Loading required package: rtracklayer
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/GenomicFiles/reduceByRange-methods.Rd_%03d_medium.png", width=480, height=480)
> ### Name: reduceByRange
> ### Title: Parallel computations by ranges
> ### Aliases: reduceByRange reduceByRange,GRanges,ANY-method
> ### reduceByRange,GRangesList,ANY-method
> ### reduceByRange,GenomicFiles,missing-method reduceRanges
> ### Keywords: methods
>
> ### ** Examples
>
>
> if (all(requireNamespace("RNAseqData.HNRNPC.bam.chr14", quietly=TRUE) &&
+ require(GenomicAlignments))) {
+ ## -----------------------------------------------------------------------
+ ## Compute coverage across BAM files.
+ ## -----------------------------------------------------------------------
+ fls <- ## 8 bam files
+ RNAseqData.HNRNPC.bam.chr14::RNAseqData.HNRNPC.bam.chr14_BAMFILES
+
+ ## Regions of interest.
+ gr <- GRanges("chr14", IRanges(c(62262735, 63121531, 63980327),
+ width=214700))
+
+ ## The MAP computes the coverage ...
+ MAP <- function(range, file, ...) {
+ requireNamespace("GenomicFiles", quietly=TRUE)
+ ## for coverage(), Rsamtools::ScanBamParam()
+ param = Rsamtools::ScanBamParam(which=range)
+ GenomicFiles::coverage(file, param=param)[range]
+ }
+ ## and REDUCE adds the last and current results.
+ REDUCE <- function(mapped, ...)
+ Reduce("+", mapped)
+
+ ## -----------------------------------------------------------------------
+ ## reduceByRange:
+
+ ## With no REDUCE, coverage is computed for each range / file combination.
+ cov1 <- reduceByRange(gr, fls, MAP)
+ cov1[[1]]
+
+ ## Each call to coverage() produces an RleList which accumulate on the
+ ## workers. We can use a reducer to combine these lists either iteratively
+ ## or non-iteratively. When iterate = TRUE the current result
+ ## is collapsed with the last resulting in a maximum of 2 RleLists on
+ ## a worker at any given time.
+ cov2 <- reduceByRange(gr, fls, MAP, REDUCE, iterate=TRUE)
+ cov2[[1]]
+
+ ## If memory use is not a concern (or if MAP output is not large) the
+ ## REDUCE function can be applied non-iteratively.
+ cov3 <- reduceByRange(gr, fls, MAP, REDUCE, iterate=FALSE)
+
+ ## Results match those obtained with the iterative REDUCE.
+ cov3[[1]]
+
+ ## When 'ranges' is a GRangesList, the list elements are sent to the
+ ## workers instead of a single range as in the case of a GRanges.
+ grl <- GRangesList(gr[1], gr[2:3])
+ grl
+
+ cov4 <- reduceByRange(grl, fls, MAP)
+ length(cov4) ## length of GRangesList
+ elementNROWS(cov4) ## number of files
+
+ ## -----------------------------------------------------------------------
+ ## reduceRanges:
+
+ ## This function passes the character vector of all file names to MAP.
+ ## MAP must handle each file separately or invoke a method that operates
+ ## on a list of files.
+
+ ## TODO: example
+ }
Loading required package: GenomicAlignments
[1] 8 8
>
>
>
>
>
> dev.off()
null device
1
>