R: Encode the overlaps between RNA-seq reads and the transcripts...
encodeOverlaps-methods
R Documentation
Encode the overlaps between RNA-seq reads and the transcripts of
a gene model
Description
In the context of an RNA-seq experiment, encoding the overlaps between
the aligned reads and the transcripts of a given gene model can be used
for detecting those overlaps that are compatible with the splicing
of the transcript.
The central tool for this is the encodeOverlaps method for
GRangesList objects, which computes the "overlap
encodings" between a query and a subject, both list-like
objects with list elements containing multiple ranges.
Other related utilities are also documented in this man page.
Typically GRangesList objects representing the
the aligned reads and the transcripts of a given gene model, respectively.
If the 2 objects don't have the same length, and if the hits
argument is not supplied, then the shortest is recycled to the length
of the longest (the standard recycling rules apply).
More generally speaking, query and subject must be
list-like objects with list elements containing multiple ranges e.g.
RangesList or GRangesList objects.
hits
An optional Hits object typically obtained from a
previous call to findOverlaps(query, subject).
Strictly speaking, hits only needs to be compatible with
query and subject, that is,
queryLength(hits) and
subjectLength(hits) must be equal to
length(query) and length(subject), respectively.
Supplying hits is a convenient way to do
encodeOverlaps(query[queryHits(hits)], subject[subjectHits(hits)]),
that is, calling encodeOverlaps(query, subject, hits) is equivalent
to the above, but is much more efficient, especially when query
and/or subject are big. Of course, when hits is supplied,
query and subject are not expected to have the same length
anymore.
...
Additional arguments for methods.
flip.query.if.wrong.strand
See the "OverlapEncodings" vignette located in this package
(GenomicAlignments).
x
For flipQuery: a GRangesList object.
For isCompatibleWithSplicing, isCompatibleWithSkippedExons,
extractSteppedExonRanks, extractSpannedExonRanks, and
extractSkippedExonRanks: an OverlapEncodings object,
a factor, or a character vector.
i
Subscript specifying the elements in x to flip. If missing, all
the elements are flipped.
ovencA, ovencB, ovenc
OverlapEncodings objects.
query.strand, subject.strand
Vector-like objects containing the strand of the query and subject,
respectively.
max.skipped.exons
Not supported yet. If NA (the default), the number of skipped
exons must be 1 or more (there is no max).
for.query.right.end
If TRUE, then the information reported in the output is for
the right ends of the paired-end reads.
Using for.query.right.end=TRUE with single-end reads is an error.
Details
See ?OverlapEncodings for a short introduction to "overlap encodings".
The topic of working with overlap encodings is covered in details
in the "OverlapEncodings" vignette located this package
(GenomicAlignments) and accessible with
vignette("OverlapEncodings").
Value
For encodeOverlaps: An OverlapEncodings object.
If hits is not supplied, this object is parallel to the
longest of query and subject, that is, it has the length
of the longest and the i-th encoding in it corresponds to the i-th element
in the longest.
If hits is supplied, then the returned object is parallel
to it, that is, it has one encoding per hit.
For flipQuery: TODO
For selectEncodingWithCompatibleStrand: TODO
For isCompatibleWithSplicing and isCompatibleWithSkippedExons:
A logical vector parallel to x.
For extractSteppedExonRanks, extractSpannedExonRanks, and
extractSkippedExonRanks: TODO
For extractQueryStartInTranscript: TODO
Author(s)
Herv<c3><83><c2><a9> Pag<c3><83><c2><a8>s
See Also
The OverlapEncodings class for a brief introduction to
"overlap encodings".
The Hits class defined and documented in the
S4Vectors package.
The "OverlapEncodings" vignette in this package.
findCompatibleOverlaps for a specialized version
of findOverlaps that uses
encodeOverlaps internally to keep only the hits where
the junctions in the aligned read are compatible with
the splicing of the annotated transcript.
The GRangesList class defined and documented
in the GenomicRanges package.
The findOverlaps generic function defined
in the IRanges package.
Examples
## ---------------------------------------------------------------------
## A. BETWEEN 2 RangesList OBJECTS
## ---------------------------------------------------------------------
## In the context of an RNA-seq experiment, encoding the overlaps
## between 2 GRangesList objects, one containing the reads (the query),
## and one containing the transcripts (the subject), can be used for
## detecting hits between reads and transcripts that are "compatible"
## with the splicing of the transcript. Here we illustrate this with 2
## RangesList objects, in order to keep things simple:
## 4 aligned reads in the query:
read1 <- IRanges(c(7, 15, 22), c(9, 19, 23)) # 2 junctions
read2 <- IRanges(c(5, 15), c(9, 17)) # 1 junction
read3 <- IRanges(c(16, 22), c(19, 24)) # 1 junction
read4 <- IRanges(c(16, 23), c(19, 24)) # 1 junction
query <- IRangesList(read1, read2, read3, read4)
## 1 transcript in the subject:
tx <- IRanges(c(1, 4, 15, 22, 38), c(2, 9, 19, 25, 47)) # 5 exons
subject <- IRangesList(tx)
## Encode the overlaps:
ovenc <- encodeOverlaps(query, subject)
ovenc
encoding(ovenc)
## Reads that are "compatible" with the transcript can be detected with
## a regular expression (the regular expression below assumes that
## reads have at most 2 junctions):
regex0 <- "(:[fgij]:|:[jg].:.[gf]:|:[jg]..:.g.:..[gf]:)"
grepl(regex0, encoding(ovenc)) # read4 is NOT "compatible"
## This was for illustration purpose only. In practise you don't need
## (and should not) use this regular expression, but use instead the
## isCompatibleWithSplicing() utility function:
isCompatibleWithSplicing(ovenc)
## ---------------------------------------------------------------------
## B. BETWEEN 2 GRangesList OBJECTS
## ---------------------------------------------------------------------
## With real RNA-seq data, the reads and transcripts will typically be
## stored in GRangesList objects. Please refer to the "OverlapEncodings"
## vignette in this package for realistic examples.
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(GenomicAlignments)
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: Biostrings
Loading required package: XVector
Loading required package: Rsamtools
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/GenomicAlignments/encodeOverlaps-methods.Rd_%03d_medium.png", width=480, height=480)
> ### Name: encodeOverlaps-methods
> ### Title: Encode the overlaps between RNA-seq reads and the transcripts of
> ### a gene model
> ### Aliases: encodeOverlaps-methods encodeOverlaps
> ### encodeOverlaps,RangesList,RangesList-method
> ### encodeOverlaps,RangesList,Ranges-method
> ### encodeOverlaps,Ranges,RangesList-method encodeOverlaps1 flipQuery
> ### encodeOverlaps,GRangesList,GRangesList-method
> ### selectEncodingWithCompatibleStrand isCompatibleWithSplicing
> ### isCompatibleWithSplicing,character-method
> ### isCompatibleWithSplicing,factor-method
> ### isCompatibleWithSplicing,OverlapEncodings-method
> ### isCompatibleWithSkippedExons
> ### isCompatibleWithSkippedExons,character-method
> ### isCompatibleWithSkippedExons,factor-method
> ### isCompatibleWithSkippedExons,OverlapEncodings-method
> ### extractSteppedExonRanks extractSteppedExonRanks,character-method
> ### extractSteppedExonRanks,factor-method
> ### extractSteppedExonRanks,OverlapEncodings-method
> ### extractSpannedExonRanks extractSpannedExonRanks,character-method
> ### extractSpannedExonRanks,factor-method
> ### extractSpannedExonRanks,OverlapEncodings-method
> ### extractSkippedExonRanks extractSkippedExonRanks,character-method
> ### extractSkippedExonRanks,factor-method
> ### extractSkippedExonRanks,OverlapEncodings-method
> ### extractQueryStartInTranscript
> ### Keywords: methods utilities
>
> ### ** Examples
>
> ## ---------------------------------------------------------------------
> ## A. BETWEEN 2 RangesList OBJECTS
> ## ---------------------------------------------------------------------
> ## In the context of an RNA-seq experiment, encoding the overlaps
> ## between 2 GRangesList objects, one containing the reads (the query),
> ## and one containing the transcripts (the subject), can be used for
> ## detecting hits between reads and transcripts that are "compatible"
> ## with the splicing of the transcript. Here we illustrate this with 2
> ## RangesList objects, in order to keep things simple:
>
> ## 4 aligned reads in the query:
> read1 <- IRanges(c(7, 15, 22), c(9, 19, 23)) # 2 junctions
> read2 <- IRanges(c(5, 15), c(9, 17)) # 1 junction
> read3 <- IRanges(c(16, 22), c(19, 24)) # 1 junction
> read4 <- IRanges(c(16, 23), c(19, 24)) # 1 junction
> query <- IRangesList(read1, read2, read3, read4)
>
> ## 1 transcript in the subject:
> tx <- IRanges(c(1, 4, 15, 22, 38), c(2, 9, 19, 25, 47)) # 5 exons
> subject <- IRangesList(tx)
>
> ## Encode the overlaps:
> ovenc <- encodeOverlaps(query, subject)
> ovenc
OverlapEncodings object of length 4
Loffset Roffset encoding flippedQuery
[1] 1 1 3:jmm:agm:aaf: FALSE
[2] 1 2 2:jm:af: FALSE
[3] 2 1 2:jm:af: FALSE
[4] 2 1 2:jm:ai: FALSE
> encoding(ovenc)
[1] 3:jmm:agm:aaf: 2:jm:af: 2:jm:af: 2:jm:ai:
Levels: 2:jm:af: 2:jm:ai: 3:jmm:agm:aaf:
>
> ## Reads that are "compatible" with the transcript can be detected with
> ## a regular expression (the regular expression below assumes that
> ## reads have at most 2 junctions):
> regex0 <- "(:[fgij]:|:[jg].:.[gf]:|:[jg]..:.g.:..[gf]:)"
> grepl(regex0, encoding(ovenc)) # read4 is NOT "compatible"
[1] TRUE TRUE TRUE FALSE
>
> ## This was for illustration purpose only. In practise you don't need
> ## (and should not) use this regular expression, but use instead the
> ## isCompatibleWithSplicing() utility function:
> isCompatibleWithSplicing(ovenc)
[1] TRUE TRUE TRUE FALSE
>
> ## ---------------------------------------------------------------------
> ## B. BETWEEN 2 GRangesList OBJECTS
> ## ---------------------------------------------------------------------
> ## With real RNA-seq data, the reads and transcripts will typically be
> ## stored in GRangesList objects. Please refer to the "OverlapEncodings"
> ## vignette in this package for realistic examples.
>
>
>
>
>
> dev.off()
null device
1
>