Last data update: 2014.03.03

R: GNCList objects
GNCList-classR Documentation

GNCList objects

Description

The GNCList class is a container for storing the Nested Containment List representation of a vector of genomic ranges (typically represented as a GRanges object). To preprocess a GRanges object, simply call the GNCList constructor function on it. The resulting GNCList object can then be used for efficient overlap-based operations on the genomic ranges.

Usage

GNCList(x)

Arguments

x

The GRanges (or more generally GenomicRanges) object to preprocess.

Details

The IRanges package also defines the NCList and NCLists constructors and classes for preprocessing and representing a Ranges or RangesList object as a data structure based on Nested Containment Lists.

Note that GNCList objects (introduced in BioC 3.1) are replacements for GIntervalTree objects (BioC < 3.1).

See ?NCList in the IRanges package for some important differences between the new algorithm based on Nested Containment Lists and the old algorithm based on interval trees. In particular, the new algorithm supports preprocessing of a GenomicRanges object with ranges defined on circular sequences (e.g. on the mitochnodrial chromosome). See below for some examples.

Value

A GNCList object.

Author(s)

H. Pag<c3><83><c2><a8>s

References

Alexander V. Alekseyenko and Christopher J. Lee – Nested Containment List (NCList): a new algorithm for accelerating interval query of genome alignment and interval databases. Bioinformatics (2007) 23 (11): 1386-1393. doi: 10.1093/bioinformatics/btl647

See Also

  • The NCList and NCLists constructors and classs defined in the IRanges package.

  • findOverlaps for finding/counting interval overlaps between two range-based objects.

  • GRanges objects.

Examples

## The examples below are for illustration purpose only and do NOT
## reflect typical usage. This is because, for a one time use, it is
## NOT advised to explicitely preprocess the input for findOverlaps()
## or countOverlaps(). These functions will take care of it and do a
## better job at it (by preprocessing only what's needed when it's
## needed, and release memory as they go).

## ---------------------------------------------------------------------
## PREPROCESS QUERY OR SUBJECT
## ---------------------------------------------------------------------

query <- GRanges(Rle(c("chrM", "chr1", "chrM", "chr1"), 4:1),
                 IRanges(1:10, width=5), strand=rep(c("+", "-"), 5))
subject <- GRanges(Rle(c("chr1", "chr2", "chrM"), 3:1),
                   IRanges(6:1, width=5), strand="+")

## Either the query or the subject of findOverlaps() can be preprocessed:

ppsubject <- GNCList(subject)
hits1a <- findOverlaps(query, ppsubject)
hits1a
hits1b <- findOverlaps(query, ppsubject, ignore.strand=TRUE)
hits1b

ppquery <- GNCList(query)
hits2a <- findOverlaps(ppquery, subject)
hits2a
hits2b <- findOverlaps(ppquery, subject, ignore.strand=TRUE)
hits2b

## Note that 'hits1a' and 'hits2a' contain the same hits but not
## necessarily in the same order.
stopifnot(identical(sort(hits1a), sort(hits2a)))
## Same for 'hits1b' and 'hits2b'.
stopifnot(identical(sort(hits1b), sort(hits2b)))

## ---------------------------------------------------------------------
## WITH CIRCULAR SEQUENCES
## ---------------------------------------------------------------------

seqinfo <- Seqinfo(c("chr1", "chr2", "chrM"),
                   seqlengths=c(100, 50, 10),
                   isCircular=c(FALSE, FALSE, TRUE))
seqinfo(query) <- seqinfo[seqlevels(query)]
seqinfo(subject) <- seqinfo[seqlevels(subject)]

ppsubject <- GNCList(subject)
hits3 <- findOverlaps(query, ppsubject)
hits3

## Circularity introduces more hits:

stopifnot(all(hits1a %in% hits3))
new_hits <- setdiff(hits3, hits1a)
new_hits  # 1 new hit
query[queryHits(new_hits)]
subject[subjectHits(new_hits)]  # positions 11:13 on chrM are the same
                                # as positions 1:3

## Sanity checks:
stopifnot(identical(new_hits, Hits(9, 6, 10, 6, sort.by.query=TRUE)))
ppquery <- GNCList(query)
hits4 <- findOverlaps(ppquery, subject)
stopifnot(identical(sort(hits3), sort(hits4)))

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(GenomicRanges)
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/GenomicRanges/GNCList-class.Rd_%03d_medium.png", width=480, height=480)
> ### Name: GNCList-class
> ### Title: GNCList objects
> ### Aliases: class:GNCList GNCList-class GNCList granges,GNCList-method
> ###   length,GNCList-method names,GNCList-method seqnames,GNCList-method
> ###   start,GNCList-method end,GNCList-method width,GNCList-method
> ###   ranges,GNCList-method strand,GNCList-method seqinfo,GNCList-method
> ###   coerce,GNCList,GRanges-method coerce,GenomicRanges,GNCList-method
> ### Keywords: classes methods
> 
> ### ** Examples
> 
> ## The examples below are for illustration purpose only and do NOT
> ## reflect typical usage. This is because, for a one time use, it is
> ## NOT advised to explicitely preprocess the input for findOverlaps()
> ## or countOverlaps(). These functions will take care of it and do a
> ## better job at it (by preprocessing only what's needed when it's
> ## needed, and release memory as they go).
> 
> ## ---------------------------------------------------------------------
> ## PREPROCESS QUERY OR SUBJECT
> ## ---------------------------------------------------------------------
> 
> query <- GRanges(Rle(c("chrM", "chr1", "chrM", "chr1"), 4:1),
+                  IRanges(1:10, width=5), strand=rep(c("+", "-"), 5))
> subject <- GRanges(Rle(c("chr1", "chr2", "chrM"), 3:1),
+                    IRanges(6:1, width=5), strand="+")
> 
> ## Either the query or the subject of findOverlaps() can be preprocessed:
> 
> ppsubject <- GNCList(subject)
> hits1a <- findOverlaps(query, ppsubject)
> hits1a
Hits object with 8 hits and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         1           6
  [2]         3           6
  [3]         5           3
  [4]         5           2
  [5]         5           1
  [6]         7           3
  [7]         7           2
  [8]         7           1
  -------
  queryLength: 10 / subjectLength: 6
> hits1b <- findOverlaps(query, ppsubject, ignore.strand=TRUE)
> hits1b
Hits object with 14 hits and 0 metadata columns:
       queryHits subjectHits
       <integer>   <integer>
   [1]         1           6
   [2]         2           6
   [3]         3           6
   [4]         4           6
   [5]         5           3
   ...       ...         ...
  [10]         6           1
  [11]         7           3
  [12]         7           2
  [13]         7           1
  [14]        10           1
  -------
  queryLength: 10 / subjectLength: 6
> 
> ppquery <- GNCList(query)
> hits2a <- findOverlaps(ppquery, subject)
> hits2a
Hits object with 8 hits and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         1           6
  [2]         3           6
  [3]         5           1
  [4]         5           2
  [5]         5           3
  [6]         7           1
  [7]         7           2
  [8]         7           3
  -------
  queryLength: 10 / subjectLength: 6
> hits2b <- findOverlaps(ppquery, subject, ignore.strand=TRUE)
> hits2b
Hits object with 14 hits and 0 metadata columns:
       queryHits subjectHits
       <integer>   <integer>
   [1]         1           6
   [2]         2           6
   [3]         3           6
   [4]         4           6
   [5]         5           1
   ...       ...         ...
  [10]         6           3
  [11]         7           1
  [12]         7           2
  [13]         7           3
  [14]        10           1
  -------
  queryLength: 10 / subjectLength: 6
> 
> ## Note that 'hits1a' and 'hits2a' contain the same hits but not
> ## necessarily in the same order.
> stopifnot(identical(sort(hits1a), sort(hits2a)))
> ## Same for 'hits1b' and 'hits2b'.
> stopifnot(identical(sort(hits1b), sort(hits2b)))
> 
> ## ---------------------------------------------------------------------
> ## WITH CIRCULAR SEQUENCES
> ## ---------------------------------------------------------------------
> 
> seqinfo <- Seqinfo(c("chr1", "chr2", "chrM"),
+                    seqlengths=c(100, 50, 10),
+                    isCircular=c(FALSE, FALSE, TRUE))
> seqinfo(query) <- seqinfo[seqlevels(query)]
> seqinfo(subject) <- seqinfo[seqlevels(subject)]
> 
> ppsubject <- GNCList(subject)
> hits3 <- findOverlaps(query, ppsubject)
> hits3
Hits object with 9 hits and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         1           6
  [2]         3           6
  [3]         5           3
  [4]         5           2
  [5]         5           1
  [6]         7           3
  [7]         7           2
  [8]         7           1
  [9]         9           6
  -------
  queryLength: 10 / subjectLength: 6
> 
> ## Circularity introduces more hits:
> 
> stopifnot(all(hits1a %in% hits3))
> new_hits <- setdiff(hits3, hits1a)
> new_hits  # 1 new hit
Hits object with 1 hit and 0 metadata columns:
      queryHits subjectHits
      <integer>   <integer>
  [1]         9           6
  -------
  queryLength: 10 / subjectLength: 6
> query[queryHits(new_hits)]
GRanges object with 1 range and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chrM   [9, 13]      +
  -------
  seqinfo: 2 sequences (1 circular) from an unspecified genome
> subject[subjectHits(new_hits)]  # positions 11:13 on chrM are the same
GRanges object with 1 range and 0 metadata columns:
      seqnames    ranges strand
         <Rle> <IRanges>  <Rle>
  [1]     chrM    [1, 5]      +
  -------
  seqinfo: 3 sequences (1 circular) from an unspecified genome
>                                 # as positions 1:3
> 
> ## Sanity checks:
> stopifnot(identical(new_hits, Hits(9, 6, 10, 6, sort.by.query=TRUE)))
> ppquery <- GNCList(query)
> hits4 <- findOverlaps(ppquery, subject)
> stopifnot(identical(sort(hits3), sort(hits4)))
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>