The NCList class is a container for storing the Nested Containment List
representation of a Ranges object. Preprocessing a Ranges
object as a Nested Containment List allows efficient overlap-based
operations like findOverlaps.
The NCLists class is a container for storing a collection of NCList objects.
An NCLists object is typically the result of preprocessing each list
element of a RangesList object as a Nested Containment List.
Like with NCList, the NCLists object can then be used for efficient
overlap-based operations.
To preprocess a Ranges or RangesList object, simply call
the NCList or NCLists constructor function on it.
Use only if the space (or spaces if x is a RangesList
object) on top of which the ranges in x are defined needs (need)
to be considered circular. If that's the case, then use
circle.length to specify the length(s) of the circular space(s).
For NCList, circle.length must be a single positive
integer (or NA if the space is linear).
For NCLists, it must be an integer vector parallel to x
(i.e. same length) and with positive or NA values (NAs indicate linear
spaces).
Details
The GenomicRanges package also defines the
GNCList constructor and class for
preprocessing and representing a vector of genomic ranges as a
data structure based on Nested Containment Lists.
Some important differences between the new findOverlaps/countOverlaps
implementation based on Nested Containment Lists (BioC >= 3.1) and the
old implementation based on Interval Trees (BioC < 3.1):
With the new implementation, the hits returned by
findOverlaps are not fully ordered (i.e. ordered
by queryHits and subject Hits) anymore, but only partially
ordered (i.e. ordered by queryHits only). Other than that, and
except for the 2 particular situations mentioned below, the 2
implementations produce the same output. However, the new
implementation is faster and more memory efficient.
With the new implementation, either the query or the subject can
be preprocessed with NCList for a Ranges object
(replacement for IntervalTree), NCLists
for a RangesList object (replacement for
IntervalForest), and
GNCList for a
GenomicRanges object (replacement for
GIntervalTree).
However, for a one-time use, it is NOT advised to explicitely
preprocess the input. This is because findOverlaps
or countOverlaps will take care of it and do a better
job at it (by preprocessing only what's needed when it's needed,
and releasing memory as they go).
With the new implementation, countOverlaps on
Ranges or GenomicRanges objects doesn't
call findOverlaps in order to collect all the hits in
a growing Hits object and count them only at the end. Instead,
the counting happens at the C level and the hits are not kept. This
reduces memory usage considerably when there is a lot of hits.
When minoverlap=0, zero-width ranges are now interpreted
as insertion points and considered to overlap with ranges that
contain them. With the old alogrithm, zero-width ranges were always
ignored. This is the 1st situation where the new and old
implementations produce different outputs.
When using select="arbitrary", the new implementation will
generally not select the same hits as the old implementation. This is
the 2nd situation where the new and old implementations produce
different outputs.
The new implementation supports preprocessing of a
GenomicRanges object with ranges defined
on circular sequences (e.g. on the mitochnodrial chromosome).
See GNCList in the GenomicRanges
package for some examples.
Objects preprocessed with NCList, NCLists, and
GNCList are serializable (with
save) for later use. Not a typical thing to do though,
because preprocessing is very cheap (i.e. very fast and memory
efficient).
Value
An NCList object for the NCList constructor and an NCLists object
for the NCLists constructor.
Author(s)
Herv<c3><83><c2><a9> Pag<c3><83><c2><a8>s
References
Alexander V. Alekseyenko and Christopher J. Lee –
Nested Containment List (NCList): a new algorithm for accelerating interval
query of genome alignment and interval databases.
Bioinformatics (2007) 23 (11): 1386-1393.
doi: 10.1093/bioinformatics/btl647
See Also
The GNCList constructor and class
defined in the GenomicRanges package.
findOverlaps for finding/counting interval overlaps
between two range-based objects.
Ranges and RangesList objects.
Examples
## The example below is for illustration purpose only and does NOT
## reflect typical usage. This is because, for a one-time use, it is
## NOT advised to explicitely preprocess the input for findOverlaps()
## or countOverlaps(). These functions will take care of it and do a
## better job at it (by preprocessing only what's needed when it's
## needed, and release memory as they go).
query <- IRanges(c(1, 4, 9), c(5, 7, 10))
subject <- IRanges(c(2, 2, 10), c(2, 3, 12))
## Either the query or the subject of findOverlaps() can be preprocessed:
ppsubject <- NCList(subject)
hits1 <- findOverlaps(query, ppsubject)
hits1
ppquery <- NCList(query)
hits2 <- findOverlaps(ppquery, subject)
hits2
## Note that 'hits1' and 'hits2' contain the same hits but not in the
## same order.
stopifnot(identical(sort(hits1), sort(hits2)))
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(IRanges)
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/IRanges/NCList-class.Rd_%03d_medium.png", width=480, height=480)
> ### Name: NCList-class
> ### Title: Nested Containment List objects
> ### Aliases: class:NCList NCList-class NCList ranges,NCList-method
> ### length,NCList-method names,NCList-method start,NCList-method
> ### end,NCList-method width,NCList-method coerce,NCList,IRanges-method
> ### coerce,Ranges,NCList-method class:NCLists NCLists-class NCLists
> ### ranges,NCLists-method length,NCLists-method names,NCLists-method
> ### start,NCLists-method end,NCLists-method width,NCLists-method
> ### elementNROWS,NCLists-method
> ### coerce,NCLists,CompressedIRangesList-method
> ### coerce,NCLists,IRangesList-method coerce,RangesList,NCLists-method
> ### Keywords: classes methods
>
> ### ** Examples
>
> ## The example below is for illustration purpose only and does NOT
> ## reflect typical usage. This is because, for a one-time use, it is
> ## NOT advised to explicitely preprocess the input for findOverlaps()
> ## or countOverlaps(). These functions will take care of it and do a
> ## better job at it (by preprocessing only what's needed when it's
> ## needed, and release memory as they go).
>
> query <- IRanges(c(1, 4, 9), c(5, 7, 10))
> subject <- IRanges(c(2, 2, 10), c(2, 3, 12))
>
> ## Either the query or the subject of findOverlaps() can be preprocessed:
>
> ppsubject <- NCList(subject)
> hits1 <- findOverlaps(query, ppsubject)
> hits1
Hits object with 3 hits and 0 metadata columns:
queryHits subjectHits
<integer> <integer>
[1] 1 2
[2] 1 1
[3] 3 3
-------
queryLength: 3 / subjectLength: 3
>
> ppquery <- NCList(query)
> hits2 <- findOverlaps(ppquery, subject)
> hits2
Hits object with 3 hits and 0 metadata columns:
queryHits subjectHits
<integer> <integer>
[1] 1 1
[2] 1 2
[3] 3 3
-------
queryLength: 3 / subjectLength: 3
>
> ## Note that 'hits1' and 'hits2' contain the same hits but not in the
> ## same order.
> stopifnot(identical(sort(hits1), sort(hits2)))
>
>
>
>
>
> dev.off()
null device
1
>