R: Transform genomic ranges into "absolute" ranges
absoluteRanges
R Documentation
Transform genomic ranges into "absolute" ranges
Description
absoluteRanges transforms the genomic ranges in x into
absolute ranges i.e. into ranges counted from the beginning of
the virtual sequence obtained by concatenating all the sequences in the
underlying genome (in the order reported by seqlevels(x)).
relativeRanges performs the reverse transformation.
NOTE: These functions only work on small genomes. See Details
section below.
Usage
absoluteRanges(x)
relativeRanges(x, seqlengths)
## Related utility:
isSmallGenome(seqlengths)
Arguments
x
For absoluteRanges: a GenomicRanges object with ranges
defined on a small genome (see Details section below).
For relativeRanges: a Ranges object.
seqlengths
An object holding sequence lengths. This can be a named integer
(or numeric) vector with no duplicated names as returned by
seqlengths(), or any object from
which sequence lengths can be extracted with
seqlengths().
For relativeRanges, seqlengths must describe a small
genome (see Details section below).
Details
Because absoluteRanges returns the absolute ranges in an
IRanges object, and because an IRanges
object cannot hold ranges with an end > .Machine$integer.max
(i.e. >= 2^31 on most platforms), absoluteRanges cannot be used
if the size of the underlying genome (i.e. the total length of the
sequences in it) is > .Machine$integer.max. Utility function
isSmallGenome is provided as a mean for the user to check
upfront whether the genome is small (i.e. its size is <=
.Machine$integer.max) or not, and thus compatible with
absoluteRanges or not.
relativeRanges applies the same restriction by looking at the
seqlengths argument.
Value
An IRanges object for absoluteRanges.
A GRanges object for relativeRanges.
isSmallGenome returns TRUE if the total length of the underlying
sequences is <= .Machine$integer.max (e.g. Fly genome),
FALSE if not (e.g. Human genome), or NA if it cannot be computed (because
some sequence lengths are NA).
Author(s)
H. Pag<c3><83><c2><a8>s
See Also
GRanges objects.
IRanges objects in the IRanges package.
Seqinfo objects and the seqlengths getter in
the GenomeInfoDb package.
genomicvars for manipulating genomic variables.
The tileGenome function for putting tiles on a
genome.
Examples
## ---------------------------------------------------------------------
## TOY EXAMPLE
## ---------------------------------------------------------------------
gr <- GRanges(Rle(c("chr2", "chr1", "chr3", "chr1"), 4:1),
IRanges(1:10, width=5),
seqinfo=Seqinfo(c("chr1", "chr2", "chr3"), c(100, 50, 20)))
ar <- absoluteRanges(gr)
ar
gr2 <- relativeRanges(ar, seqlengths(gr))
gr2
## Sanity check:
stopifnot(all(gr == gr2))
## ---------------------------------------------------------------------
## ON REAL DATA
## ---------------------------------------------------------------------
## With a "small" genome
library(TxDb.Dmelanogaster.UCSC.dm3.ensGene)
txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene
ex <- exons(txdb)
ex
isSmallGenome(ex)
## Note that because isSmallGenome() can return NA (see Value section
## above), its result should always be wrapped inside isTRUE() when
## used in an if statement:
if (isTRUE(isSmallGenome(ex))) {
ar <- absoluteRanges(ex)
ar
ex2 <- relativeRanges(ar, seqlengths(ex))
ex2 # original strand is not restored
## Sanity check:
strand(ex2) <- strand(ex) # restore the strand
stopifnot(all(ex == ex2))
}
## With a "big" genome (but we can reduce it)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
ex <- exons(txdb)
isSmallGenome(ex)
## Not run:
absoluteRanges(ex) # error!
## End(Not run)
## However, if we are only interested in some chromosomes, we might
## still be able to use absoluteRanges():
seqlevels(ex, force=TRUE) <- paste0("chr", 1:10)
isSmallGenome(ex) # TRUE!
ar <- absoluteRanges(ex)
ex2 <- relativeRanges(ar, seqlengths(ex))
## Sanity check:
strand(ex2) <- strand(ex)
stopifnot(all(ex == ex2))
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(GenomicRanges)
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: GenomeInfoDb
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/GenomicRanges/absoluteRanges.Rd_%03d_medium.png", width=480, height=480)
> ### Name: absoluteRanges
> ### Title: Transform genomic ranges into "absolute" ranges
> ### Aliases: absoluteRanges relativeRanges isSmallGenome
> ### Keywords: manip
>
> ### ** Examples
>
> ## ---------------------------------------------------------------------
> ## TOY EXAMPLE
> ## ---------------------------------------------------------------------
>
> gr <- GRanges(Rle(c("chr2", "chr1", "chr3", "chr1"), 4:1),
+ IRanges(1:10, width=5),
+ seqinfo=Seqinfo(c("chr1", "chr2", "chr3"), c(100, 50, 20)))
>
> ar <- absoluteRanges(gr)
> ar
IRanges object with 10 ranges and 0 metadata columns:
start end width
<integer> <integer> <integer>
[1] 101 105 5
[2] 102 106 5
[3] 103 107 5
[4] 104 108 5
[5] 5 9 5
[6] 6 10 5
[7] 7 11 5
[8] 158 162 5
[9] 159 163 5
[10] 10 14 5
>
> gr2 <- relativeRanges(ar, seqlengths(gr))
> gr2
GRanges object with 10 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chr2 [ 1, 5] *
[2] chr2 [ 2, 6] *
[3] chr2 [ 3, 7] *
[4] chr2 [ 4, 8] *
[5] chr1 [ 5, 9] *
[6] chr1 [ 6, 10] *
[7] chr1 [ 7, 11] *
[8] chr3 [ 8, 12] *
[9] chr3 [ 9, 13] *
[10] chr1 [10, 14] *
-------
seqinfo: 3 sequences from an unspecified genome
>
> ## Sanity check:
> stopifnot(all(gr == gr2))
>
> ## ---------------------------------------------------------------------
> ## ON REAL DATA
> ## ---------------------------------------------------------------------
>
> ## With a "small" genome
>
> library(TxDb.Dmelanogaster.UCSC.dm3.ensGene)
Loading required package: GenomicFeatures
Loading required package: AnnotationDbi
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
> txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene
> ex <- exons(txdb)
> ex
GRanges object with 76920 ranges and 1 metadata column:
seqnames ranges strand | exon_id
<Rle> <IRanges> <Rle> | <integer>
[1] chr2L [7529, 8116] + | 1
[2] chr2L [8193, 8589] + | 2
[3] chr2L [8193, 9484] + | 3
[4] chr2L [8229, 9484] + | 4
[5] chr2L [8668, 9484] + | 5
... ... ... ... . ...
[76916] chrYHet [327915, 328204] - | 76916
[76917] chrYHet [328270, 328489] - | 76917
[76918] chrUextra [523024, 523048] - | 76918
[76919] chrUextra [523024, 523086] - | 76919
[76920] chrUextra [523060, 523086] - | 76920
-------
seqinfo: 15 sequences (1 circular) from dm3 genome
>
> isSmallGenome(ex)
[1] TRUE
>
> ## Note that because isSmallGenome() can return NA (see Value section
> ## above), its result should always be wrapped inside isTRUE() when
> ## used in an if statement:
> if (isTRUE(isSmallGenome(ex))) {
+ ar <- absoluteRanges(ex)
+ ar
+
+ ex2 <- relativeRanges(ar, seqlengths(ex))
+ ex2 # original strand is not restored
+
+ ## Sanity check:
+ strand(ex2) <- strand(ex) # restore the strand
+ stopifnot(all(ex == ex2))
+ }
>
> ## With a "big" genome (but we can reduce it)
>
> library(TxDb.Hsapiens.UCSC.hg19.knownGene)
> txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
> ex <- exons(txdb)
> isSmallGenome(ex)
[1] FALSE
> ## Not run:
> ##D absoluteRanges(ex) # error!
> ## End(Not run)
>
> ## However, if we are only interested in some chromosomes, we might
> ## still be able to use absoluteRanges():
> seqlevels(ex, force=TRUE) <- paste0("chr", 1:10)
> isSmallGenome(ex) # TRUE!
[1] TRUE
> ar <- absoluteRanges(ex)
> ex2 <- relativeRanges(ar, seqlengths(ex))
>
> ## Sanity check:
> strand(ex2) <- strand(ex)
> stopifnot(all(ex == ex2))
>
>
>
>
>
> dev.off()
null device
1
>