R: Finding the nearest genomic tuple/range neighbor
nearest-methods
R Documentation
Finding the nearest genomic tuple/range neighbor
Description
The nearest, precede, follow, distance
and distanceToNearest methods for GTuples
objects and subclasses.
NOTE: These methods treat the tuples as if they were ranges, with
ranges given by [pos_{1}, pos_{m}] and where m is the
size,GTuples-method of the tuples. This is done via inheritance
so that a GTuples object is treated as a
GRanges and the appropriate method is dispatched
upon.
The subject GTuples instance
within which the nearest neighbors are found. Can be missing,
in which case x is also the subject.
y
For the distance method, a GTuples or GRanges
instance. Cannot be missing. If x and y are not the same
length, the shortest will be recycled to match the length of the longest.
select
Logic for handling ties. By default, all methods
select a single tuple/range (arbitrary for nearest,
the first by order in subject for precede, and the
last for follow).
When select = "all" a Hits object is returned with
all matches for x. If x does not have a match in
subject the x is not included in the Hits
object.
ignore.strand
A logical indicating if the strand of the input tuples/ranges
should be ignored. When TRUE, strand is set to '+'.
...
Additional arguments for methods.
Details
nearest:
Performs conventional nearest neighbor finding.
Returns an integer vector containing the index of the nearest neighbor
tuple/range in subject for each range in x. If there is no
nearest neighbor NA is returned. For details of the algorithm
see the man page in IRanges, ?nearest.
precede:
For each range in x, precede returns
the index of the tuple/range in subject that is directly
preceded by the tuple/range in x. Overlapping tuples/ranges are
excluded. NA is returned when there are no qualifying
tuples/ranges in subject.
follow:
The opposite of precede, follow returns
the index of the tuple/range in subject that is directly followed
by the tuple/range in x. Overlapping tuples/ranges are excluded.
NA is returned when there are no qualifying tuples/ranges in
subject.
Orientation and Strand:
The relevant orientation for precede and follow
is 5' to 3', consistent with the direction of translation.
Because positional numbering along a chromosome is from left to
right and transcription takes place from 5' to 3', precede and
follow can appear to have ‘opposite’ behavior on the +
and - strand. Using positions 5 and 6 as an example, 5 precedes
6 on the + strand but follows 6 on the - strand.
A tuple/range with strand * can be compared to tuples/ranges on
either the + or - strand. Below we outline the priority when
tuples/ranges on multiple strands are compared. When
ignore.strand=TRUE all tuples/ranges are treated as if on the
+ strand.
x on + strand can match to tuples/ranges on both + and
* strands. In the case of a tie the first tuple/range by order
is chosen.
x on - strand can match to tuples/ranges on both - and
* strands. In the case of a tie the first tuple/range by order
is chosen.
x on * strand can match to tuples/ranges on any of +,
- or * strands. In the case of a tie the first tuple/range by
order is chosen.
distanceToNearest: Returns the distance for each tuple/range in
x to its nearest neighbor in the subject.
distance:
Returns the distance for each tuple/range in x to the range in
y. The behavior of distance has changed in Bioconductor
2.12. See the man page ?distance in IRanges for details.
Value
For nearest, precede and follow, an integer
vector of indices in subject, or a Hits if
select = "all".
For distanceToNearest, a Hits object with a
column for the query index (from), subject index
(to) and the distance between the pair.
For distance, an integer vector of distances between the tuples/ranges
in x and y.
Author(s)
Peter Hickey for methods involving GTuples. P. Aboyoun
and V. Obenchain <vobencha@fhcrc.org> for all the real work underlying the
powerful nearest methods.
See Also
The GTuples and GRanges classes.
The GenomicRanges and
GRanges classes in the GenomicRanges package.
The Ranges class in the IRanges package.
The Hits class in the S4Vectors package.
The nearest-methods man page in the
GenomicRanges package.
findOverlaps-methods for finding just the
overlapping ranges.