Last data update: 2014.03.03

R: Transform genomic ranges into "absolute" ranges
absoluteRangesR Documentation

Transform genomic ranges into "absolute" ranges

Description

absoluteRanges transforms the genomic ranges in x into absolute ranges i.e. into ranges counted from the beginning of the virtual sequence obtained by concatenating all the sequences in the underlying genome (in the order reported by seqlevels(x)).

relativeRanges performs the reverse transformation.

NOTE: These functions only work on small genomes. See Details section below.

Usage

absoluteRanges(x)
relativeRanges(x, seqlengths)

## Related utility:
isSmallGenome(seqlengths)

Arguments

x

For absoluteRanges: a GenomicRanges object with ranges defined on a small genome (see Details section below).

For relativeRanges: a Ranges object.

seqlengths

An object holding sequence lengths. This can be a named integer (or numeric) vector with no duplicated names as returned by seqlengths(), or any object from which sequence lengths can be extracted with seqlengths().

For relativeRanges, seqlengths must describe a small genome (see Details section below).

Details

Because absoluteRanges returns the absolute ranges in an IRanges object, and because an IRanges object cannot hold ranges with an end > .Machine$integer.max (i.e. >= 2^31 on most platforms), absoluteRanges cannot be used if the size of the underlying genome (i.e. the total length of the sequences in it) is > .Machine$integer.max. Utility function isSmallGenome is provided as a mean for the user to check upfront whether the genome is small (i.e. its size is <= .Machine$integer.max) or not, and thus compatible with absoluteRanges or not.

relativeRanges applies the same restriction by looking at the seqlengths argument.

Value

An IRanges object for absoluteRanges.

A GRanges object for relativeRanges.

isSmallGenome returns TRUE if the total length of the underlying sequences is <= .Machine$integer.max (e.g. Fly genome), FALSE if not (e.g. Human genome), or NA if it cannot be computed (because some sequence lengths are NA).

Author(s)

H. Pag<c3><83><c2><a8>s

See Also

  • GRanges objects.

  • IRanges objects in the IRanges package.

  • Seqinfo objects and the seqlengths getter in the GenomeInfoDb package.

  • genomicvars for manipulating genomic variables.

  • The tileGenome function for putting tiles on a genome.

Examples

## ---------------------------------------------------------------------
## TOY EXAMPLE
## ---------------------------------------------------------------------

gr <- GRanges(Rle(c("chr2", "chr1", "chr3", "chr1"), 4:1),
              IRanges(1:10, width=5),
              seqinfo=Seqinfo(c("chr1", "chr2", "chr3"), c(100, 50, 20)))

ar <- absoluteRanges(gr)
ar

gr2 <- relativeRanges(ar, seqlengths(gr))
gr2

## Sanity check:
stopifnot(all(gr == gr2))

## ---------------------------------------------------------------------
## ON REAL DATA
## ---------------------------------------------------------------------

## With a "small" genome

library(TxDb.Dmelanogaster.UCSC.dm3.ensGene)
txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene
ex <- exons(txdb)
ex

isSmallGenome(ex)

## Note that because isSmallGenome() can return NA (see Value section
## above), its result should always be wrapped inside isTRUE() when
## used in an if statement:
if (isTRUE(isSmallGenome(ex))) {
    ar <- absoluteRanges(ex)
    ar

    ex2 <- relativeRanges(ar, seqlengths(ex))
    ex2  # original strand is not restored

    ## Sanity check:
    strand(ex2) <- strand(ex)  # restore the strand
    stopifnot(all(ex == ex2))
}

## With a "big" genome (but we can reduce it)

library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
ex <- exons(txdb)
isSmallGenome(ex)
## Not run: 
    absoluteRanges(ex)  # error!

## End(Not run)

## However, if we are only interested in some chromosomes, we might
## still be able to use absoluteRanges():
seqlevels(ex, force=TRUE) <- paste0("chr", 1:10)
isSmallGenome(ex)  # TRUE!
ar <- absoluteRanges(ex)
ex2 <- relativeRanges(ar, seqlengths(ex))

## Sanity check:
strand(ex2) <- strand(ex) 
stopifnot(all(ex == ex2))

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(GenomicRanges)
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/GenomicRanges/absoluteRanges.Rd_%03d_medium.png", width=480, height=480)
> ### Name: absoluteRanges
> ### Title: Transform genomic ranges into "absolute" ranges
> ### Aliases: absoluteRanges relativeRanges isSmallGenome
> ### Keywords: manip
> 
> ### ** Examples
> 
> ## ---------------------------------------------------------------------
> ## TOY EXAMPLE
> ## ---------------------------------------------------------------------
> 
> gr <- GRanges(Rle(c("chr2", "chr1", "chr3", "chr1"), 4:1),
+               IRanges(1:10, width=5),
+               seqinfo=Seqinfo(c("chr1", "chr2", "chr3"), c(100, 50, 20)))
> 
> ar <- absoluteRanges(gr)
> ar
IRanges object with 10 ranges and 0 metadata columns:
           start       end     width
       <integer> <integer> <integer>
   [1]       101       105         5
   [2]       102       106         5
   [3]       103       107         5
   [4]       104       108         5
   [5]         5         9         5
   [6]         6        10         5
   [7]         7        11         5
   [8]       158       162         5
   [9]       159       163         5
  [10]        10        14         5
> 
> gr2 <- relativeRanges(ar, seqlengths(gr))
> gr2
GRanges object with 10 ranges and 0 metadata columns:
       seqnames    ranges strand
          <Rle> <IRanges>  <Rle>
   [1]     chr2  [ 1,  5]      *
   [2]     chr2  [ 2,  6]      *
   [3]     chr2  [ 3,  7]      *
   [4]     chr2  [ 4,  8]      *
   [5]     chr1  [ 5,  9]      *
   [6]     chr1  [ 6, 10]      *
   [7]     chr1  [ 7, 11]      *
   [8]     chr3  [ 8, 12]      *
   [9]     chr3  [ 9, 13]      *
  [10]     chr1  [10, 14]      *
  -------
  seqinfo: 3 sequences from an unspecified genome
> 
> ## Sanity check:
> stopifnot(all(gr == gr2))
> 
> ## ---------------------------------------------------------------------
> ## ON REAL DATA
> ## ---------------------------------------------------------------------
> 
> ## With a "small" genome
> 
> library(TxDb.Dmelanogaster.UCSC.dm3.ensGene)
Loading required package: GenomicFeatures
Loading required package: AnnotationDbi
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

> txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene
> ex <- exons(txdb)
> ex
GRanges object with 76920 ranges and 1 metadata column:
           seqnames           ranges strand |   exon_id
              <Rle>        <IRanges>  <Rle> | <integer>
      [1]     chr2L     [7529, 8116]      + |         1
      [2]     chr2L     [8193, 8589]      + |         2
      [3]     chr2L     [8193, 9484]      + |         3
      [4]     chr2L     [8229, 9484]      + |         4
      [5]     chr2L     [8668, 9484]      + |         5
      ...       ...              ...    ... .       ...
  [76916]   chrYHet [327915, 328204]      - |     76916
  [76917]   chrYHet [328270, 328489]      - |     76917
  [76918] chrUextra [523024, 523048]      - |     76918
  [76919] chrUextra [523024, 523086]      - |     76919
  [76920] chrUextra [523060, 523086]      - |     76920
  -------
  seqinfo: 15 sequences (1 circular) from dm3 genome
> 
> isSmallGenome(ex)
[1] TRUE
> 
> ## Note that because isSmallGenome() can return NA (see Value section
> ## above), its result should always be wrapped inside isTRUE() when
> ## used in an if statement:
> if (isTRUE(isSmallGenome(ex))) {
+     ar <- absoluteRanges(ex)
+     ar
+ 
+     ex2 <- relativeRanges(ar, seqlengths(ex))
+     ex2  # original strand is not restored
+ 
+     ## Sanity check:
+     strand(ex2) <- strand(ex)  # restore the strand
+     stopifnot(all(ex == ex2))
+ }
> 
> ## With a "big" genome (but we can reduce it)
> 
> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
> txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
> ex <- exons(txdb)
> isSmallGenome(ex)
[1] FALSE
> ## Not run: 
> ##D     absoluteRanges(ex)  # error!
> ## End(Not run)
> 
> ## However, if we are only interested in some chromosomes, we might
> ## still be able to use absoluteRanges():
> seqlevels(ex, force=TRUE) <- paste0("chr", 1:10)
> isSmallGenome(ex)  # TRUE!
[1] TRUE
> ar <- absoluteRanges(ex)
> ex2 <- relativeRanges(ar, seqlengths(ex))
> 
> ## Sanity check:
> strand(ex2) <- strand(ex) 
> stopifnot(all(ex == ex2))
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>