Last data update: 2014.03.03
R: Comparing and ordering genomic ranges
GenomicRanges-comparison R Documentation
Comparing and ordering genomic ranges
Description
Methods for comparing and ordering the elements in one or more
GenomicRanges objects.
Usage
## duplicated()
## ------------
## S4 method for signature 'GenomicRanges'
duplicated(x, incomparables=FALSE, fromLast=FALSE,
method=c("auto", "quick", "hash"))
## match() & selfmatch()
## ---------------------
## S4 method for signature 'GenomicRanges,GenomicRanges'
match(x, table, nomatch=NA_integer_, incomparables=NULL,
method=c("auto", "quick", "hash"), ignore.strand=FALSE)
## S4 method for signature 'GenomicRanges'
selfmatch(x, method=c("auto", "quick", "hash"), ignore.strand=FALSE)
## order() and related methods
## ----------------------------
## S4 method for signature 'GenomicRanges'
is.unsorted(x, na.rm=FALSE, strictly=FALSE, ignore.strand=FALSE)
## S4 method for signature 'GenomicRanges'
order(..., na.last=TRUE, decreasing=FALSE, method=c("shell", "radix"))
## S4 method for signature 'GenomicRanges'
sort(x, decreasing=FALSE, ignore.strand=FALSE, by)
## S4 method for signature 'GenomicRanges'
rank(x, na.last=TRUE,
ties.method=c("average", "first", "random", "max", "min"))
## Generalized parallel comparison of 2 GenomicRanges objects
## ----------------------------------------------------------
## S4 method for signature 'GenomicRanges,GenomicRanges'
pcompare(x, y)
Arguments
x, table, y
GenomicRanges objects.
incomparables
Not supported.
fromLast, method, nomatch
See ?`Ranges-comparison`
in the IRanges
package for a description of these arguments.
ignore.strand
Whether or not the strand should be ignored when comparing 2 genomic
ranges.
na.rm
Ignored.
strictly
Logical indicating if the check should be for strictly increasing
values.
...
One or more GenomicRanges objects. The GenomicRanges objects
after the first one are used to break ties.
na.last
Ignored.
decreasing
TRUE
or FALSE
.
ties.method
A character string specifying how ties are treated. Only "first"
is supported for now.
by
An optional formula that is resolved against as.env(x)
;
the resulting variables are passed to order
to generate the
ordering permutation.
Details
Two elements of a GenomicRanges object (i.e. two genomic ranges) are
considered equal iff they are on the same underlying sequence and strand,
and have the same start and width. duplicated()
and unique()
on a GenomicRanges object are conforming to this.
The "natural order" for the elements of a GenomicRanges object is to
order them (a) first by sequence level, (b) then by strand, (c) then by
start, (d) and finally by width.
This way, the space of genomic ranges is totally ordered.
Note that the reduce
method for GenomicRanges uses this
"natural order" implicitly. Also, note that, because we already do (c)
and (d) for regular ranges (see ?`Ranges-comparison`
),
genomic ranges that belong to the same underlying sequence and strand are
ordered like regular ranges.
is.unsorted()
, order()
, sort()
, and rank()
on a
GenomicRanges object behave accordingly to this "natural order".
==
, !=
, <=
, >=
, <
and >
on GenomicRanges objects also behave accordingly to this
"natural order".
Author(s)
H. Pag<c3><83><c2><a8>s, is.unsorted
contributed by Pete Hickey
See Also
The GenomicRanges class.
Ranges-comparison in the IRanges
package for comparing and ordering genomic ranges.
findOverlaps-methods for finding
overlapping genomic ranges.
intra-range-methods and
inter-range-methods for intra range and
inter range transformations of a GRanges object.
setops-methods for set operations on
GenomicRanges objects.
Examples
gr0 <- GRanges(
Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
IRanges(c(1:9,7L), end=10),
strand=Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)),
seqlengths=c(chr1=11, chr2=12, chr3=13)
)
gr <- c(gr0, gr0[7:3])
names(gr) <- LETTERS[seq_along(gr)]
## ---------------------------------------------------------------------
## A. ELEMENT-WISE (AKA "PARALLEL") COMPARISON OF 2 GenomicRanges OBJECTS
## ---------------------------------------------------------------------
gr[2] == gr[2] # TRUE
gr[2] == gr[5] # FALSE
gr == gr[4]
gr >= gr[3]
## ---------------------------------------------------------------------
## B. duplicated(), unique()
## ---------------------------------------------------------------------
duplicated(gr)
unique(gr)
## ---------------------------------------------------------------------
## C. match(), %in%
## ---------------------------------------------------------------------
table <- gr[1:7]
match(gr, table)
match(gr, table, ignore.strand=TRUE)
gr %in% table
## ---------------------------------------------------------------------
## D. findMatches(), countMatches()
## ---------------------------------------------------------------------
findMatches(gr, table)
countMatches(gr, table)
findMatches(gr, table, ignore.strand=TRUE)
countMatches(gr, table, ignore.strand=TRUE)
gr_levels <- unique(gr)
countMatches(gr_levels, gr)
## ---------------------------------------------------------------------
## E. order() AND RELATED METHODS
## ---------------------------------------------------------------------
is.unsorted(gr)
order(gr)
sort(gr)
is.unsorted(sort(gr))
is.unsorted(gr, ignore.strand=TRUE)
gr2 <- sort(gr, ignore.strand=TRUE)
is.unsorted(gr2) # TRUE
is.unsorted(gr2, ignore.strand=TRUE) # FALSE
## TODO: Broken. Please fix!
#sort(gr, by = ~ seqnames + start + end) # equivalent to (but slower than) above
score(gr) <- rev(seq_len(length(gr)))
## TODO: Broken. Please fix!
#sort(gr, by = ~ score)
rank(gr)
## ---------------------------------------------------------------------
## F. GENERALIZED ELEMENT-WISE COMPARISON OF 2 GenomicRanges OBJECTS
## ---------------------------------------------------------------------
gr3 <- GRanges(c(rep("chr1", 12), "chr2"), IRanges(c(1:11, 6:7), width=3))
strand(gr3)[12] <- "+"
gr4 <- GRanges("chr1", IRanges(5, 9))
pcompare(gr3, gr4)
rangeComparisonCodeToLetter(pcompare(gr3, gr4))
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(GenomicRanges)
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: GenomeInfoDb
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/GenomicRanges/GenomicRanges-comparison.Rd_%03d_medium.png", width=480, height=480)
> ### Name: GenomicRanges-comparison
> ### Title: Comparing and ordering genomic ranges
> ### Aliases: GenomicRanges-comparison pcompare
> ### pcompare,GenomicRanges,GenomicRanges-method
> ### duplicated,GenomicRanges-method duplicated.GenomicRanges
> ### match,GenomicRanges,GenomicRanges-method
> ### selfmatch,GenomicRanges-method is.unsorted,GenomicRanges-method
> ### order,GenomicRanges-method sort,GenomicRanges-method
> ### sort.GenomicRanges rank,GenomicRanges-method
> ### Keywords: methods
>
> ### ** Examples
>
> gr0 <- GRanges(
+ Rle(c("chr1", "chr2", "chr1", "chr3"), c(1, 3, 2, 4)),
+ IRanges(c(1:9,7L), end=10),
+ strand=Rle(strand(c("-", "+", "*", "+", "-")), c(1, 2, 2, 3, 2)),
+ seqlengths=c(chr1=11, chr2=12, chr3=13)
+ )
> gr <- c(gr0, gr0[7:3])
> names(gr) <- LETTERS[seq_along(gr)]
>
> ## ---------------------------------------------------------------------
> ## A. ELEMENT-WISE (AKA "PARALLEL") COMPARISON OF 2 GenomicRanges OBJECTS
> ## ---------------------------------------------------------------------
> gr[2] == gr[2] # TRUE
[1] TRUE
> gr[2] == gr[5] # FALSE
[1] FALSE
> gr == gr[4]
[1] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE TRUE FALSE
> gr >= gr[3]
[1] FALSE FALSE TRUE TRUE FALSE FALSE TRUE TRUE TRUE TRUE TRUE FALSE
[13] FALSE TRUE TRUE
>
> ## ---------------------------------------------------------------------
> ## B. duplicated(), unique()
> ## ---------------------------------------------------------------------
> duplicated(gr)
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE
[13] TRUE TRUE TRUE
> unique(gr)
GRanges object with 10 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
A chr1 [1, 10] -
B chr2 [2, 10] +
C chr2 [3, 10] +
D chr2 [4, 10] *
E chr1 [5, 10] *
F chr1 [6, 10] +
G chr3 [7, 10] +
H chr3 [8, 10] +
I chr3 [9, 10] -
J chr3 [7, 10] -
-------
seqinfo: 3 sequences from an unspecified genome
>
> ## ---------------------------------------------------------------------
> ## C. match(), %in%
> ## ---------------------------------------------------------------------
> table <- gr[1:7]
> match(gr, table)
[1] 1 2 3 4 5 6 7 NA NA NA 7 6 5 4 3
> match(gr, table, ignore.strand=TRUE)
[1] 1 2 3 4 5 6 7 NA NA 7 7 6 5 4 3
>
> gr %in% table
[1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE FALSE FALSE TRUE TRUE
[13] TRUE TRUE TRUE
>
> ## ---------------------------------------------------------------------
> ## D. findMatches(), countMatches()
> ## ---------------------------------------------------------------------
> findMatches(gr, table)
Hits object with 12 hits and 0 metadata columns:
queryHits subjectHits
<integer> <integer>
[1] 1 1
[2] 2 2
[3] 3 3
[4] 4 4
[5] 5 5
... ... ...
[8] 11 7
[9] 12 6
[10] 13 5
[11] 14 4
[12] 15 3
-------
queryLength: 15 / subjectLength: 7
> countMatches(gr, table)
[1] 1 1 1 1 1 1 1 0 0 0 1 1 1 1 1
>
> findMatches(gr, table, ignore.strand=TRUE)
Hits object with 13 hits and 0 metadata columns:
queryHits subjectHits
<integer> <integer>
[1] 1 1
[2] 2 2
[3] 3 3
[4] 4 4
[5] 5 5
... ... ...
[9] 11 7
[10] 12 6
[11] 13 5
[12] 14 4
[13] 15 3
-------
queryLength: 15 / subjectLength: 7
> countMatches(gr, table, ignore.strand=TRUE)
[1] 1 1 1 1 1 1 1 0 0 1 1 1 1 1 1
>
> gr_levels <- unique(gr)
> countMatches(gr_levels, gr)
[1] 1 1 2 2 2 2 2 1 1 1
>
> ## ---------------------------------------------------------------------
> ## E. order() AND RELATED METHODS
> ## ---------------------------------------------------------------------
> is.unsorted(gr)
[1] TRUE
> order(gr)
[1] 6 12 1 5 13 2 3 15 4 14 7 11 8 10 9
> sort(gr)
GRanges object with 15 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
F chr1 [6, 10] +
L chr1 [6, 10] +
A chr1 [1, 10] -
E chr1 [5, 10] *
M chr1 [5, 10] *
. ... ... ...
G chr3 [7, 10] +
K chr3 [7, 10] +
H chr3 [8, 10] +
J chr3 [7, 10] -
I chr3 [9, 10] -
-------
seqinfo: 3 sequences from an unspecified genome
> is.unsorted(sort(gr))
[1] FALSE
>
> is.unsorted(gr, ignore.strand=TRUE)
[1] TRUE
> gr2 <- sort(gr, ignore.strand=TRUE)
> is.unsorted(gr2) # TRUE
[1] TRUE
> is.unsorted(gr2, ignore.strand=TRUE) # FALSE
[1] FALSE
>
> ## TODO: Broken. Please fix!
> #sort(gr, by = ~ seqnames + start + end) # equivalent to (but slower than) above
>
> score(gr) <- rev(seq_len(length(gr)))
>
> ## TODO: Broken. Please fix!
> #sort(gr, by = ~ score)
>
> rank(gr)
[1] 3 6 7 9 4 1 11 13 15 14 12 2 5 10 8
>
> ## ---------------------------------------------------------------------
> ## F. GENERALIZED ELEMENT-WISE COMPARISON OF 2 GenomicRanges OBJECTS
> ## ---------------------------------------------------------------------
> gr3 <- GRanges(c(rep("chr1", 12), "chr2"), IRanges(c(1:11, 6:7), width=3))
> strand(gr3)[12] <- "+"
> gr4 <- GRanges("chr1", IRanges(5, 9))
>
> pcompare(gr3, gr4)
[1] -6 -5 -4 -4 -1 2 3 4 4 5 6 -11 42
> rangeComparisonCodeToLetter(pcompare(gr3, gr4))
[1] a b c c f i j k k l m X X
Levels: a b c d e f g h i j k l m X
>
>
>
>
>
> dev.off()
null device
1
>