The GenomicFiles class is a matrix-like container where rows represent ranges of interest and columns represent files. The class is designed for byFile or byRange queries.
Computations are distributed in parallel by file. Data subsets are extracted and manipulated (MAP) and optionally combined (REDUCE) within a single file.
Rsamtools files can be created with a ‘yieldSize’ argument that influences the number of records (chunk size) input at one time (see, e.g,. BamFile). reduceByYield iterates through the file, processing each chunk and reducing it with previously input chunks. This is a memory efficient way to process large data files, especially when the final result fits in memory.