R: Finding hits between reads and transcripts that are...
findCompatibleOverlaps-methods
R Documentation
Finding hits between reads and transcripts that are compatible
with the splicing of the transcript
Description
In the context of an RNA-seq experiment, findCompatibleOverlaps
(or countCompatibleOverlaps) can be used for finding (or counting)
hits between reads and transcripts that are compatible
with the splicing of the transcript.
A GAlignments or GAlignmentPairs object representing
the aligned reads.
subject
A GRangesList object representing the transcripts.
Details
findCompatibleOverlaps is a specialized version of
findOverlaps that uses
encodeOverlaps internally to keep only
the hits where the junctions in the aligned read are compatible
with the splicing of the annotated transcript.
The topic of working with overlap encodings is covered in details
in the "OverlapEncodings" vignette located this package
(GenomicAlignments) and accessible with
vignette("OverlapEncodings").
Value
A Hits object for findCompatibleOverlaps.
An integer vector parallel to (i.e. same length as) query
for countCompatibleOverlaps.
Author(s)
Herv<c3><83><c2><a9> Pag<c3><83><c2><a8>s
See Also
The findOverlaps generic function defined
in the IRanges package.
The encodeOverlaps generic function and
OverlapEncodings class.
The "OverlapEncodings" vignette in this package.
GAlignments and GAlignmentPairs objects.
GRangesList objects in the
GenomicRanges package.
Examples
## Here we only show a simple example illustrating the use of
## countCompatibleOverlaps() on a very small data set. Please
## refer to the "OverlapEncodings" vignette in the GenomicAlignments
## package for a comprehensive presentation of "overlap
## encodings" and related tools/concepts (e.g. "compatible"
## overlaps, "almost compatible" overlaps etc...), and for more
## examples.
## sm_treated1.bam contains a small subset of treated1.bam, a BAM
## file containing single-end reads from the "Pasilla" experiment
## (RNA-seq, Fly, see the pasilla data package for the details)
## and aligned to reference genome BDGP Release 5 (aka dm3 genome on
## the UCSC Genome Browser):
sm_treated1 <- system.file("extdata", "sm_treated1.bam",
package="GenomicAlignments", mustWork=TRUE)
## Load the alignments:
flag0 <- scanBamFlag(isDuplicate=FALSE, isNotPassingQualityControls=FALSE)
param0 <- ScanBamParam(flag=flag0)
gal <- readGAlignments(sm_treated1, use.names=TRUE, param=param0)
## Load the transcripts (IMPORTANT: Like always, the reference genome
## of the transcripts must be *exactly* the same as the reference
## genome used to align the reads):
library(TxDb.Dmelanogaster.UCSC.dm3.ensGene)
txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene
exbytx <- exonsBy(txdb, by="tx", use.names=TRUE)
## Number of "compatible" transcripts per alignment in 'gal':
gal_ncomptx <- countCompatibleOverlaps(gal, exbytx)
mcols(gal)$ncomptx <- gal_ncomptx
table(gal_ncomptx)
mean(gal_ncomptx >= 1)
## --> 33% of the alignments in 'gal' are "compatible" with at least
## 1 transcript in 'exbytx'.
## Keep only alignments compatible with at least 1 transcript in
## 'exbytx':
compgal <- gal[gal_ncomptx >= 1]
head(compgal)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(GenomicAlignments)
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: Biostrings
Loading required package: XVector
Loading required package: Rsamtools
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/GenomicAlignments/findCompatibleOverlaps-methods.Rd_%03d_medium.png", width=480, height=480)
> ### Name: findCompatibleOverlaps-methods
> ### Title: Finding hits between reads and transcripts that are _compatible_
> ### with the splicing of the transcript
> ### Aliases: findCompatibleOverlaps-methods findCompatibleOverlaps
> ### findCompatibleOverlaps,GAlignments,GRangesList-method
> ### findCompatibleOverlaps,GAlignmentPairs,GRangesList-method
> ### countCompatibleOverlaps
> ### Keywords: methods utilities
>
> ### ** Examples
>
> ## Here we only show a simple example illustrating the use of
> ## countCompatibleOverlaps() on a very small data set. Please
> ## refer to the "OverlapEncodings" vignette in the GenomicAlignments
> ## package for a comprehensive presentation of "overlap
> ## encodings" and related tools/concepts (e.g. "compatible"
> ## overlaps, "almost compatible" overlaps etc...), and for more
> ## examples.
>
> ## sm_treated1.bam contains a small subset of treated1.bam, a BAM
> ## file containing single-end reads from the "Pasilla" experiment
> ## (RNA-seq, Fly, see the pasilla data package for the details)
> ## and aligned to reference genome BDGP Release 5 (aka dm3 genome on
> ## the UCSC Genome Browser):
> sm_treated1 <- system.file("extdata", "sm_treated1.bam",
+ package="GenomicAlignments", mustWork=TRUE)
>
> ## Load the alignments:
> flag0 <- scanBamFlag(isDuplicate=FALSE, isNotPassingQualityControls=FALSE)
> param0 <- ScanBamParam(flag=flag0)
> gal <- readGAlignments(sm_treated1, use.names=TRUE, param=param0)
>
> ## Load the transcripts (IMPORTANT: Like always, the reference genome
> ## of the transcripts must be *exactly* the same as the reference
> ## genome used to align the reads):
> library(TxDb.Dmelanogaster.UCSC.dm3.ensGene)
Loading required package: GenomicFeatures
Loading required package: AnnotationDbi
> txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene
> exbytx <- exonsBy(txdb, by="tx", use.names=TRUE)
>
> ## Number of "compatible" transcripts per alignment in 'gal':
> gal_ncomptx <- countCompatibleOverlaps(gal, exbytx)
> mcols(gal)$ncomptx <- gal_ncomptx
> table(gal_ncomptx)
gal_ncomptx
0 2 3 11
1204 2 8 586
> mean(gal_ncomptx >= 1)
[1] 0.3311111
> ## --> 33% of the alignments in 'gal' are "compatible" with at least
> ## 1 transcript in 'exbytx'.
>
> ## Keep only alignments compatible with at least 1 transcript in
> ## 'exbytx':
> compgal <- gal[gal_ncomptx >= 1]
> head(compgal)
GAlignments object with 6 alignments and 1 metadata column:
seqnames strand cigar qwidth start end
<Rle> <Rle> <character> <integer> <integer> <integer>
SRR031722.3117908 chr2L + 45M 45 7541 7585
SRR031721.2103139 chr2L + 45M 45 7908 7952
SRR031723.2621241 chr2L - 40M 40 8219 8258
SRR031719.2892342 chr2L + 44M 44 8263 8306
SRR031721.2200896 chr2L - 45M 45 8267 8311
SRR031722.5251362 chr2L - 45M 45 8644 8688
width njunc | ncomptx
<integer> <integer> | <integer>
SRR031722.3117908 45 0 | 3
SRR031721.2103139 45 0 | 3
SRR031723.2621241 40 0 | 2
SRR031719.2892342 44 0 | 3
SRR031721.2200896 45 0 | 3
SRR031722.5251362 45 0 | 2
-------
seqinfo: 3 sequences from an unspecified genome
>
>
>
>
>
> dev.off()
null device
1
>