The GTuples class is a container for the genomic tuples and their
associated annotations.
Details
GTuples extends GRanges as a container
for genomic tuples rather than genomic ranges. GTuples is a vector of
genomic locations and associated annotations. Each element in the vector is
comprised of a sequence name, a tuple, a strand,
and optional metadata columns (e.g. score, GC content, etc.). This
information is stored in four components:
seqnames
a 'factor' Rle object
containing the sequence names.
tuples
externally, a matrix-link object containing the
tuples. Internally, an IRanges object storing the
first and last position of each tuple and, if required, a matrix storing
the "internal" positions of each tuple (see description of
internalPos below).
strand
a Rle Rle object containing
the strand information.
mcols
a DataFrame object
containing the metadata columns. Columns cannot be named "seqnames"
"ranges", "tuples", "internalPos", "size",
"strand", "seqlevels", "seqlengths",
"isCircular", "start", "end", "width", or
"element".
seqinfo
a DataFrame object
containing information about the set of genomic sequences present in the
GTuples object.
Slots
Since the GTuples class extends the
GRanges class it contains the seqnames,
ranges, strand, elementMetadata, seqinfo and
metadata. The GTuples class also contains two additional slots,
size and internalPos.
size
An integer. The size of the genomic tuples
stored in the GTuples object.
internalPos
If the size of the genomic tuples is
greater than 2, internalPos is an integer
matrix storing the "internal" positions of each genomic tuple.
Otherwise internalPos is NULL.
Rle object, character
vector, or factor containing the sequence names.
tuples
matrix object containing the positions of the
tuples. The first column should refer to pos1, the second to pos2,
etc.
strand
Rle object, character
vector, or factor containing the strand information.
...
Optional metadata columns. These columns cannot be
named "start", "end", "width", or
"element". A named integer vector "seqlength" can be
used instead of seqinfo.
seqlengths
an integer vector named with the sequence
names and containing the lengths (or NA) for each
level(seqnames).
seqinfo
a DataFrame object
containing allowed sequence names and lengths (or NA) for each
level(seqnames).
Coercion
In the code snippets below, x is a GTuples object.
as.data.frame(x, row.names = NULL, optional = FALSE, ...):
Creates a data.frame with columns seqnames (factor),
tuples (integer), strand (factor), as well as
the additional metadata columns stored in mcols(x). Pass an
explicit stringsAsFactors=TRUE/FALSE argument via ... to
override the default conversions for the metadata columns in
mcols(x).
as.character(x, ignore.strand=FALSE):
Turn GTuples object x into a character vector where each
tuples in x is represented by a string in format
chr1:100,109,115:+. If ignore.strand is TRUE or if
all the ranges in x are unstranded (i.e. their strand
is set to *), then all the strings in the output are in
format chr1:100,109,115.
The names on x are propagated to the returned character vector.
Its metadata (metadata(x)) and metadata columns (mcols(x))
are ignored.
as(x, "GRanges"), granges(x): Creates a
GRanges object from a
GTuples object. WARNING: This is generally a
destructive operation because all "internal" positions will be
dropped.
Accessors
In the following code snippets, x is a GTuples object.
size(x): Get the size of the genomic tuples stored in
x.
length(x): Get the number of elements.
seqnames(x), seqnames(x) <- value: Get or set the
sequence names. value can be an Rle
object, a character vector, or a factor.
tuples(x), tuples(x) <- value: Get the positions of the
tuples, which are returned as an integer matrix. value can be an
integer matrix.
ranges(x, use.mcols = FALSE), ranges(x) <- value: Get or set
the ranges in the form of a CompressedIRangesList. value can be a
RangesList object.
WARNING: The use of ranges with GTuples objects is
strongly discouraged. It will only get/set pos_{1} and
pos_{m} of the tuples, where m is the size of the
tuples, as these are what are stored in the "ranges" slot of a
GTuples object.
names(x), names(x) <- value: Get or set the names
of the elements.
strand(x), strand(x) <- value: Get or set the
strand. value can be an Rle object,
character vector, or factor.
mcols(x, use.names=FALSE), mcols(x) <- value: Get
or set the metadata columns. If use.names=TRUE and the metadata
columns are not NULL, then the names of x are propagated as
the row names of the returned DataFrame
object. When setting the metadata columns, the supplied value must be
NULL or a data.frame-like object (i.e.
DataTable or data.frame)
object holding element-wise metadata.
elementMetadata(x), elementMetadata(x) <- value,
values(x), values(x) <- value: Alternatives to mcols
functions. Their use is discouraged.
seqinfo(x), seqinfo(x) <- value: Get or set the
information about the underlying sequences. value must be a
DataFrame object.
seqlevels(x), seqlevels(x, force=FALSE) <- value:
Get or set the sequence levels. seqlevels(x) is equivalent to
seqlevels(seqinfo(x)) or to levels(seqnames(x)), those 2
expressions being guaranteed to return identical character vectors on a
GTuples object. value must be a character vector with no
NAs. See ?seqlevels for more
information.
seqlengths(x), seqlengths(x) <- value: Get or set
the sequence lengths. seqlengths(x) is equivalent to
seqlengths(seqinfo(x)). value can be a named non-negative
integer or numeric vector eventually with NAs.
isCircular(x), isCircular(x) <- value: Get or set
the circularity flags. isCircular(x) is equivalent to
isCircular(seqinfo(x)). value must be a named logical
vector eventually with NAs.
genome(x), genome(x) <- value: Get or set the
genome identifier or assembly name for each sequence. genome(x) is
equivalent to genome(seqinfo(x)). value must be a named
character vector eventually with NAs.
seqlevelsStyle(x), seqlevelsStyle(x) <- value: Get
or set the seqname style for x. See the
seqlevelsStyle generic getter and setter in
the GenomeInfoDb package for more information.
score(x), score(x) <- value: Get or set the "score"
column from the element metadata.
Tuples methods
In the following code snippets, x is a GTuples object.
WARNING: The preferred setter is tuples(x) <- value and the
use of start(x) <- value, end(x) <- value and
width(x) <- value is discouraged.
start(x), start(x) <- value:
Get or set pos_{1} of the tuples. WARNING: The use of
width(x) <- value is discouraged; instead, construct the tuples as
the appropriate integer matrix, mvalue, and use
tuples(x) <- mvalue.
end(x), end(x) <- value:
Get or set pos_{m} of the tuples, where m is the size
of the tuples. WARNING: The use of end(x) <- value is
discouraged; instead, construct the tuples as the appropriate integer
matrix, mvalue, and use tuples(x) <- mvalue.
IPD(x): Get the intra-pair distances (IPD). IPD is
only defined for tuples with size > 1. The IPD of a tuple
with size= m is the vector of intra-pair distances,
(pos_{2} - pos_{1}, …, pos_{m} - pos_{m - 1}).
width(x), width(x) <- value:
Get or set pos_{m} - pos_{1} of the tuples, where m is the
size of the tuples. If using width(x) <- value,
pos_{1} is held fixed and pos_{m} is altered.
WARNING: The use of width(x) <- value is discouraged;
instead, construct the tuples as the appropriate integer matrix,
mvalue, and use tuples(x) <- mvalue.
Splitting and Combining
In the following code snippets, x is a GTuples object.
append(x, values, after = length(x)): Inserts the values
into x at the position given by after, where x and
values are of the same class.
c(x, ...): Combines x and the GTuples
objects in ... together. Any object in ... must belong to
the same class as x, or to one of its subclasses, or must be
NULL. The result is an object of the same class as x.
c(x, ..., ignore.mcols=FALSE) If the GTuples
objects have metadata columns (represented as one
DataFrame per object), each such
DataFrame must have the same columns in
order to combine successfully. In order to circumvent this restraint, you
can pass in an ignore.mcols=TRUE argument which will combine all
the objects into one and drop all of their metadata columns.
split(x, f, drop=FALSE): Splits x according to
f to create a GTuplesList object. If f is a
list-like object then drop is ignored and f is treated as
if it was rep(seq_len(length(f)), sapply(f, length)), so the
returned object has the same shape as f (it also receives the
names of f). Otherwise, if f is not a list-like object,
empty list elements are removed from the returned object if drop
is TRUE.
Subsetting
In the following code snippets, x is a GTuples object.
x[i, j], x[i, j] <- value: Get or set elements
i with optional metadata columns mcols(x)[,j], where
i can be missing; an NA-free logical, numeric, or character
vector; or a 'logical' Rle object.
x[i,j] <- value: Replaces elements i and optional
metadata columns j with value.
head(x, n = 6L): If n is non-negative, returns the
first n elements of the GTuples object. If n is
negative, returns all but the last abs(n) elements of the
GTuples object.
rep(x, times, length.out, each): Repeats the values in
x through one of the following conventions:
times
Vector giving the number of times to repeat each
element if of length length(x), or to repeat the whole vector
if of length 1.
length.out
Non-negative integer. The desired length of
the output vector.
each
Non-negative integer. Each element of x is
repeated each times.
subset(x, subset): Returns a new object of the same class
as x made of the subset using logical vector subset, where missing
values are taken as FALSE.
tail(x, n = 6L): If n is non-negative, returns the
last n elements of the GTuples object. If n is
negative, returns all but the first abs(n) elements of the
GTuples object.
window(x, start = NA, end = NA, width = NA, frequency = NULL, delta = NULL, ...):
Extracts the subsequence window from the GTuples object using:
start, end, width
The start, end, or width
of the window. Two of the three are required.
frequency, delta
Optional arguments that specify
the sampling frequency and increment within the window.
In general, this is more efficient than using "[" operator.
window(x, start = NA, end = NA, width = NA, keepLength = TRUE) <- value:
Replaces the subsequence window specified on the left (i.e. the subsequence
in x specified by start, end and width)
by value.
value must either be of class class(x), belong to a subclass
of class(x), be coercible to class(x), or be NULL.
If keepLength is TRUE, the elements of value are
repeated to create a GTuples object with the same number of elements
as the width of the subsequence window it is replacing.
If keepLength is FALSE, this replacement method can modify
the length of x, depending on how the length of the left
subsequence window compares to the length of value.
x$name, x$name <- value: Shortcuts for
mcols(x)$name and mcols(x)$name <- value, respectively.
Provided as a convenience, from GRanges as
the result of strong popular demand. Note that those methods are not
consistent with the other $ and $<- methods in the
IRanges/
GenomicRanges infrastructure, and might
confuse some users by making them believe that a
GRanges object can be manipulated as a
data.frame-like object. Therefore we recommend using them only
interactively, and we discourage their use in scripts or packages. For
the latter, use mcols(x)$name and mcols(x)$name <- value,
instead of x$name and x$name <- value, respectively.
Other methods
item
show(x): By default the show method displays 5 head and 5
tail elements. This can be changed by setting the global options
showHeadLines and showTailLines. If the object length is
less than (or equal to) the sum of these 2 options plus 1, then the
full object is displayed. Note that these options also affect the
display of GRanges,
GAlignments and
GAlignmentPairs objects (defined in
the GenomicAlignments package), as well as other objects
defined in the IRanges and Biostrings packages (e.g.
IRanges and DNAStringSet
objects).