The GNCList class is a container for storing the Nested Containment List
representation of a vector of genomic ranges (typically represented as
a GRanges object).
To preprocess a GRanges object, simply call the GNCList
constructor function on it. The resulting GNCList object can then be used
for efficient overlap-based operations on the genomic ranges.
Usage
GNCList(x)
Arguments
x
The GRanges (or more generally GenomicRanges) object to
preprocess.
Details
The IRanges package also defines the NCList
and NCLists constructors and classes for
preprocessing and representing a Ranges or
RangesList object as a data structure based on Nested
Containment Lists.
Note that GNCList objects (introduced in BioC 3.1) are replacements for
GIntervalTree objects (BioC < 3.1).
See ?NCList in the IRanges package for
some important differences between the new algorithm based on Nested
Containment Lists and the old algorithm based on interval trees.
In particular, the new algorithm supports preprocessing of a
GenomicRanges object with ranges defined on circular sequences
(e.g. on the mitochnodrial chromosome). See below for some examples.
Value
A GNCList object.
Author(s)
H. Pag<c3><83><c2><a8>s
References
Alexander V. Alekseyenko and Christopher J. Lee –
Nested Containment List (NCList): a new algorithm for accelerating interval
query of genome alignment and interval databases.
Bioinformatics (2007) 23 (11): 1386-1393.
doi: 10.1093/bioinformatics/btl647
See Also
The NCList and NCLists
constructors and classs defined in the IRanges package.
findOverlaps for finding/counting interval overlaps
between two range-based objects.
GRanges objects.
Examples
## The examples below are for illustration purpose only and do NOT
## reflect typical usage. This is because, for a one time use, it is
## NOT advised to explicitely preprocess the input for findOverlaps()
## or countOverlaps(). These functions will take care of it and do a
## better job at it (by preprocessing only what's needed when it's
## needed, and release memory as they go).
## ---------------------------------------------------------------------
## PREPROCESS QUERY OR SUBJECT
## ---------------------------------------------------------------------
query <- GRanges(Rle(c("chrM", "chr1", "chrM", "chr1"), 4:1),
IRanges(1:10, width=5), strand=rep(c("+", "-"), 5))
subject <- GRanges(Rle(c("chr1", "chr2", "chrM"), 3:1),
IRanges(6:1, width=5), strand="+")
## Either the query or the subject of findOverlaps() can be preprocessed:
ppsubject <- GNCList(subject)
hits1a <- findOverlaps(query, ppsubject)
hits1a
hits1b <- findOverlaps(query, ppsubject, ignore.strand=TRUE)
hits1b
ppquery <- GNCList(query)
hits2a <- findOverlaps(ppquery, subject)
hits2a
hits2b <- findOverlaps(ppquery, subject, ignore.strand=TRUE)
hits2b
## Note that 'hits1a' and 'hits2a' contain the same hits but not
## necessarily in the same order.
stopifnot(identical(sort(hits1a), sort(hits2a)))
## Same for 'hits1b' and 'hits2b'.
stopifnot(identical(sort(hits1b), sort(hits2b)))
## ---------------------------------------------------------------------
## WITH CIRCULAR SEQUENCES
## ---------------------------------------------------------------------
seqinfo <- Seqinfo(c("chr1", "chr2", "chrM"),
seqlengths=c(100, 50, 10),
isCircular=c(FALSE, FALSE, TRUE))
seqinfo(query) <- seqinfo[seqlevels(query)]
seqinfo(subject) <- seqinfo[seqlevels(subject)]
ppsubject <- GNCList(subject)
hits3 <- findOverlaps(query, ppsubject)
hits3
## Circularity introduces more hits:
stopifnot(all(hits1a %in% hits3))
new_hits <- setdiff(hits3, hits1a)
new_hits # 1 new hit
query[queryHits(new_hits)]
subject[subjectHits(new_hits)] # positions 11:13 on chrM are the same
# as positions 1:3
## Sanity checks:
stopifnot(identical(new_hits, Hits(9, 6, 10, 6, sort.by.query=TRUE)))
ppquery <- GNCList(query)
hits4 <- findOverlaps(ppquery, subject)
stopifnot(identical(sort(hits3), sort(hits4)))
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(GenomicRanges)
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: GenomeInfoDb
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/GenomicRanges/GNCList-class.Rd_%03d_medium.png", width=480, height=480)
> ### Name: GNCList-class
> ### Title: GNCList objects
> ### Aliases: class:GNCList GNCList-class GNCList granges,GNCList-method
> ### length,GNCList-method names,GNCList-method seqnames,GNCList-method
> ### start,GNCList-method end,GNCList-method width,GNCList-method
> ### ranges,GNCList-method strand,GNCList-method seqinfo,GNCList-method
> ### coerce,GNCList,GRanges-method coerce,GenomicRanges,GNCList-method
> ### Keywords: classes methods
>
> ### ** Examples
>
> ## The examples below are for illustration purpose only and do NOT
> ## reflect typical usage. This is because, for a one time use, it is
> ## NOT advised to explicitely preprocess the input for findOverlaps()
> ## or countOverlaps(). These functions will take care of it and do a
> ## better job at it (by preprocessing only what's needed when it's
> ## needed, and release memory as they go).
>
> ## ---------------------------------------------------------------------
> ## PREPROCESS QUERY OR SUBJECT
> ## ---------------------------------------------------------------------
>
> query <- GRanges(Rle(c("chrM", "chr1", "chrM", "chr1"), 4:1),
+ IRanges(1:10, width=5), strand=rep(c("+", "-"), 5))
> subject <- GRanges(Rle(c("chr1", "chr2", "chrM"), 3:1),
+ IRanges(6:1, width=5), strand="+")
>
> ## Either the query or the subject of findOverlaps() can be preprocessed:
>
> ppsubject <- GNCList(subject)
> hits1a <- findOverlaps(query, ppsubject)
> hits1a
Hits object with 8 hits and 0 metadata columns:
queryHits subjectHits
<integer> <integer>
[1] 1 6
[2] 3 6
[3] 5 3
[4] 5 2
[5] 5 1
[6] 7 3
[7] 7 2
[8] 7 1
-------
queryLength: 10 / subjectLength: 6
> hits1b <- findOverlaps(query, ppsubject, ignore.strand=TRUE)
> hits1b
Hits object with 14 hits and 0 metadata columns:
queryHits subjectHits
<integer> <integer>
[1] 1 6
[2] 2 6
[3] 3 6
[4] 4 6
[5] 5 3
... ... ...
[10] 6 1
[11] 7 3
[12] 7 2
[13] 7 1
[14] 10 1
-------
queryLength: 10 / subjectLength: 6
>
> ppquery <- GNCList(query)
> hits2a <- findOverlaps(ppquery, subject)
> hits2a
Hits object with 8 hits and 0 metadata columns:
queryHits subjectHits
<integer> <integer>
[1] 1 6
[2] 3 6
[3] 5 1
[4] 5 2
[5] 5 3
[6] 7 1
[7] 7 2
[8] 7 3
-------
queryLength: 10 / subjectLength: 6
> hits2b <- findOverlaps(ppquery, subject, ignore.strand=TRUE)
> hits2b
Hits object with 14 hits and 0 metadata columns:
queryHits subjectHits
<integer> <integer>
[1] 1 6
[2] 2 6
[3] 3 6
[4] 4 6
[5] 5 1
... ... ...
[10] 6 3
[11] 7 1
[12] 7 2
[13] 7 3
[14] 10 1
-------
queryLength: 10 / subjectLength: 6
>
> ## Note that 'hits1a' and 'hits2a' contain the same hits but not
> ## necessarily in the same order.
> stopifnot(identical(sort(hits1a), sort(hits2a)))
> ## Same for 'hits1b' and 'hits2b'.
> stopifnot(identical(sort(hits1b), sort(hits2b)))
>
> ## ---------------------------------------------------------------------
> ## WITH CIRCULAR SEQUENCES
> ## ---------------------------------------------------------------------
>
> seqinfo <- Seqinfo(c("chr1", "chr2", "chrM"),
+ seqlengths=c(100, 50, 10),
+ isCircular=c(FALSE, FALSE, TRUE))
> seqinfo(query) <- seqinfo[seqlevels(query)]
> seqinfo(subject) <- seqinfo[seqlevels(subject)]
>
> ppsubject <- GNCList(subject)
> hits3 <- findOverlaps(query, ppsubject)
> hits3
Hits object with 9 hits and 0 metadata columns:
queryHits subjectHits
<integer> <integer>
[1] 1 6
[2] 3 6
[3] 5 3
[4] 5 2
[5] 5 1
[6] 7 3
[7] 7 2
[8] 7 1
[9] 9 6
-------
queryLength: 10 / subjectLength: 6
>
> ## Circularity introduces more hits:
>
> stopifnot(all(hits1a %in% hits3))
> new_hits <- setdiff(hits3, hits1a)
> new_hits # 1 new hit
Hits object with 1 hit and 0 metadata columns:
queryHits subjectHits
<integer> <integer>
[1] 9 6
-------
queryLength: 10 / subjectLength: 6
> query[queryHits(new_hits)]
GRanges object with 1 range and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chrM [9, 13] +
-------
seqinfo: 2 sequences (1 circular) from an unspecified genome
> subject[subjectHits(new_hits)] # positions 11:13 on chrM are the same
GRanges object with 1 range and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chrM [1, 5] +
-------
seqinfo: 3 sequences (1 circular) from an unspecified genome
> # as positions 1:3
>
> ## Sanity checks:
> stopifnot(identical(new_hits, Hits(9, 6, 10, 6, sort.by.query=TRUE)))
> ppquery <- GNCList(query)
> hits4 <- findOverlaps(ppquery, subject)
> stopifnot(identical(sort(hits3), sort(hits4)))
>
>
>
>
>
> dev.off()
null device
1
>