Last data update: 2014.03.03

R: OverlapEncodings objects
OverlapEncodings-classR Documentation

OverlapEncodings objects

Description

The OverlapEncodings class is a container for storing the "overlap encodings" returned by the encodeOverlaps function.

Usage

## OverlapEncodings accessors:

## S4 method for signature 'OverlapEncodings'
length(x)
## S4 method for signature 'OverlapEncodings'
Loffset(x)
## S4 method for signature 'OverlapEncodings'
Roffset(x)
## S4 method for signature 'OverlapEncodings'
encoding(x)
## S4 method for signature 'OverlapEncodings'
levels(x)
## S4 method for signature 'OverlapEncodings'
flippedQuery(x)

## S4 method for signature 'OverlapEncodings'
Lencoding(x)
## S4 method for signature 'OverlapEncodings'
Rencoding(x)

## S4 method for signature 'OverlapEncodings'
njunc(x)
## S4 method for signature 'OverlapEncodings'
Lnjunc(x)
## S4 method for signature 'OverlapEncodings'
Rnjunc(x)

## Coercing an OverlapEncodings object:

## S4 method for signature 'OverlapEncodings'
as.data.frame(x, row.names=NULL, optional=FALSE, ...)

## Low-level related utilities:

## S4 method for signature 'character'
Lencoding(x)
## S4 method for signature 'character'
Rencoding(x)
## S4 method for signature 'character'
njunc(x)
## S4 method for signature 'character'
Lnjunc(x)
## S4 method for signature 'character'
Rnjunc(x)

## S4 method for signature 'factor'
Lencoding(x)
## S4 method for signature 'factor'
Rencoding(x)
## S4 method for signature 'factor'
njunc(x)
## S4 method for signature 'factor'
Lnjunc(x)
## S4 method for signature 'factor'
Rnjunc(x)

Arguments

x

An OverlapEncodings object. For the low-level utilities, x can also be a character vector or factor containing encodings.

row.names

NULL or a character vector.

optional, ...

Ignored.

Details

Given a query and a subject of the same length, both list-like objects with top-level elements typically containing multiple ranges (e.g. RangesList objects), the "overlap encoding" of the i-th element in query and i-th element in subject is a character string describing how the ranges in query[[i]] are qualitatively positioned relatively to the ranges in subject[[i]].

The encodeOverlaps function computes those overlap encodings and returns them in an OverlapEncodings object of the same length as query and subject.

The topic of working with overlap encodings is covered in details in the "OverlapEncodings" vignette located this package (GenomicAlignments) and accessible with vignette("OverlapEncodings").

OverlapEncodings accessors

In the following code snippets, x is an OverlapEncodings object typically obtained by a call to encodeOverlaps(query, subject).

length(x): Get the number of elements (i.e. encodings) in x. This is equal to length(query) and length(subject).

Loffset(x), Roffset(x): Get the "left offsets" and "right offsets" of the encodings, respectively. Both are integer vectors of the same length as x.

Let's denote Qi = query[[i]], Si = subject[[i]], and [q1,q2] the range covered by Qi i.e. q1 = min(start(Qi)) and q2 = max(end(Qi)), then Loffset(x)[i] is the number L of ranges at the head of Si that are strictly to the left of all the ranges in Qi i.e. L is the greatest value such that end(Si)[k] < q1 - 1 for all k in seq_len(L). Similarly, Roffset(x)[i] is the number R of ranges at the tail of Si that are strictly to the right of all the ranges in Qi i.e. R is the greatest value such that start(Si)[length(Si) + 1 - k] > q2 + 1 for all k in seq_len(L).

encoding(x): Factor of the same length as x where the i-th element is the encoding obtained by comparing each range in Qi with all the ranges in tSi = Si[(1+L):(length(Si)-R)] (tSi stands for "trimmed Si"). More precisely, here is how this encoding is obtained:

  1. All the ranges in Qi are compared with tSi[1], then with tSi[2], etc... At each step (one step per range in tSi), comparing all the ranges in Qi with tSi[k] is done with rangeComparisonCodeToLetter(compare(Qi, tSi[k])). So at each step, we end up with a vector of M single letters (where M is length(Qi)).

  2. Each vector obtained previously (1 vector per range in tSi, all of them of length M) is turned into a single string (called "encoding block") by pasting its individual letters together.

  3. All the encoding blocks (1 per range in tSi) are pasted together into a single long string and separated by colons (":"). An additional colon is prepended to the long string and another one appended to it.

  4. Finally, a special block containing the value of M is prepended to the long string. The final string is the encoding.

levels(x): Equivalent to levels(encoding(x)).

flippedQuery(x): Whether or not the top-level element in query used for computing the encoding was "flipped" before the encoding was computed. Note that this flipping generally affects the "left offset", "right offset", in addition to the encoding itself.

Lencoding(x), Rencoding(x): Extract the "left encodings" and "right encodings" of paired-end encodings.

Paired-end encodings are obtained by encoding paired-end overlaps i.e. overlaps between paired-end reads and transcripts (typically). The difference between a single-end encoding and a paired-end encoding is that all the blocks in the latter contain a "--" separator to mark the separation between the "left encoding" and the "right encoding".

See the "Overlap encodings" vignette located in this package for examples of paired-end encodings.

njunc(x), Lnjunc(x), Rnjunc(x): Extract the number of junctions in each encoding by looking at their first block (aka special block). If an element xi in x is a paired-end encoding, then Lnjunc(xi), Rnjunc(xi), and njunc(xi), return njunc(Lencoding(xi)), njunc(Rencoding(xi)), and Lnjunc(xi) + Rnjunc(xi), respectively.

Coercing an OverlapEncodings object

In the following code snippets, x is an OverlapEncodings object.

as.data.frame(x): Return x as a data frame with columns "Loffset", "Roffset" and "encoding".

Author(s)

Herv<c3><83><c2><a9> Pag<c3><83><c2><a8>s

See Also

  • The "OverlapEncodings" vignette in this package.

  • The encodeOverlaps function for computing "overlap encodings".

  • The compare function in the IRanges package for the interpretation of the strings returned by encoding.

  • The GRangesList class defined and documented in the GenomicRanges package.

Examples

example(encodeOverlaps)  # to generate the 'ovenc' object

length(ovenc)
Loffset(ovenc)
Roffset(ovenc)
encoding(ovenc)
levels(ovenc)
nlevels(ovenc)
flippedQuery(ovenc)
njunc(ovenc)

as.data.frame(ovenc)
njunc(levels(ovenc))

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(GenomicAlignments)
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: Biostrings
Loading required package: XVector
Loading required package: Rsamtools
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/GenomicAlignments/OverlapEncodings-class.Rd_%03d_medium.png", width=480, height=480)
> ### Name: OverlapEncodings-class
> ### Title: OverlapEncodings objects
> ### Aliases: class:OverlapEncodings OverlapEncodings-class OverlapEncodings
> ###   length,OverlapEncodings-method Loffset
> ###   Loffset,OverlapEncodings-method Roffset
> ###   Roffset,OverlapEncodings-method encoding,OverlapEncodings-method
> ###   levels,OverlapEncodings-method levels.OverlapEncodings flippedQuery
> ###   flippedQuery,OverlapEncodings-method Lencoding
> ###   Lencoding,character-method Lencoding,factor-method
> ###   Lencoding,OverlapEncodings-method Rencoding
> ###   Rencoding,character-method Rencoding,factor-method
> ###   Rencoding,OverlapEncodings-method njunc,character-method
> ###   njunc,factor-method njunc,OverlapEncodings-method Lnjunc
> ###   Lnjunc,character-method Lnjunc,factor-method
> ###   Lnjunc,OverlapEncodings-method Rnjunc Rnjunc,character-method
> ###   Rnjunc,factor-method Rnjunc,OverlapEncodings-method
> ###   as.data.frame.OverlapEncodings as.data.frame,OverlapEncodings-method
> ###   show,OverlapEncodings-method
> ### Keywords: methods classes
> 
> ### ** Examples
> 
> example(encodeOverlaps)  # to generate the 'ovenc' object

encdOv> ## ---------------------------------------------------------------------
encdOv> ## A. BETWEEN 2 RangesList OBJECTS
encdOv> ## ---------------------------------------------------------------------
encdOv> ## In the context of an RNA-seq experiment, encoding the overlaps
encdOv> ## between 2 GRangesList objects, one containing the reads (the query),
encdOv> ## and one containing the transcripts (the subject), can be used for
encdOv> ## detecting hits between reads and transcripts that are "compatible"
encdOv> ## with the splicing of the transcript. Here we illustrate this with 2
encdOv> ## RangesList objects, in order to keep things simple:
encdOv> 
encdOv> ## 4 aligned reads in the query:
encdOv> read1 <- IRanges(c(7, 15, 22), c(9, 19, 23))  # 2 junctions

encdOv> read2 <- IRanges(c(5, 15), c(9, 17))  # 1 junction

encdOv> read3 <- IRanges(c(16, 22), c(19, 24))  # 1 junction

encdOv> read4 <- IRanges(c(16, 23), c(19, 24))  # 1 junction

encdOv> query <- IRangesList(read1, read2, read3, read4)

encdOv> ## 1 transcript in the subject:
encdOv> tx <- IRanges(c(1, 4, 15, 22, 38), c(2, 9, 19, 25, 47))  # 5 exons

encdOv> subject <- IRangesList(tx)

encdOv> ## Encode the overlaps:
encdOv> ovenc <- encodeOverlaps(query, subject)

encdOv> ovenc
OverlapEncodings object of length 4
    Loffset Roffset       encoding flippedQuery
[1]       1       1 3:jmm:agm:aaf:        FALSE
[2]       1       2       2:jm:af:        FALSE
[3]       2       1       2:jm:af:        FALSE
[4]       2       1       2:jm:ai:        FALSE

encdOv> encoding(ovenc)
[1] 3:jmm:agm:aaf: 2:jm:af:       2:jm:af:       2:jm:ai:      
Levels: 2:jm:af: 2:jm:ai: 3:jmm:agm:aaf:

encdOv> ## Reads that are "compatible" with the transcript can be detected with
encdOv> ## a regular expression (the regular expression below assumes that
encdOv> ## reads have at most 2 junctions):
encdOv> regex0 <- "(:[fgij]:|:[jg].:.[gf]:|:[jg]..:.g.:..[gf]:)"

encdOv> grepl(regex0, encoding(ovenc))  # read4 is NOT "compatible"
[1]  TRUE  TRUE  TRUE FALSE

encdOv> ## This was for illustration purpose only. In practise you don't need
encdOv> ## (and should not) use this regular expression, but use instead the
encdOv> ## isCompatibleWithSplicing() utility function:
encdOv> isCompatibleWithSplicing(ovenc)
[1]  TRUE  TRUE  TRUE FALSE

encdOv> ## ---------------------------------------------------------------------
encdOv> ## B. BETWEEN 2 GRangesList OBJECTS
encdOv> ## ---------------------------------------------------------------------
encdOv> ## With real RNA-seq data, the reads and transcripts will typically be
encdOv> ## stored in GRangesList objects. Please refer to the "OverlapEncodings"
encdOv> ## vignette in this package for realistic examples.
encdOv> 
encdOv> 
encdOv> 
> 
> length(ovenc)
[1] 4
> Loffset(ovenc)
[1] 1 1 2 2
> Roffset(ovenc)
[1] 1 2 1 1
> encoding(ovenc)
[1] 3:jmm:agm:aaf: 2:jm:af:       2:jm:af:       2:jm:ai:      
Levels: 2:jm:af: 2:jm:ai: 3:jmm:agm:aaf:
> levels(ovenc)
[1] "2:jm:af:"       "2:jm:ai:"       "3:jmm:agm:aaf:"
> nlevels(ovenc)
[1] 3
> flippedQuery(ovenc)
[1] FALSE
> njunc(ovenc)
[1] 2 1 1 1
> 
> as.data.frame(ovenc)
  Loffset Roffset       encoding flippedQuery
1       1       1 3:jmm:agm:aaf:        FALSE
2       1       2       2:jm:af:        FALSE
3       2       1       2:jm:af:        FALSE
4       2       1       2:jm:ai:        FALSE
> njunc(levels(ovenc))
[1] 1 1 2
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>