Last data update: 2014.03.03

R: Finding hits between reads and transcripts that are...
findCompatibleOverlaps-methodsR Documentation

Finding hits between reads and transcripts that are compatible with the splicing of the transcript

Description

In the context of an RNA-seq experiment, findCompatibleOverlaps (or countCompatibleOverlaps) can be used for finding (or counting) hits between reads and transcripts that are compatible with the splicing of the transcript.

Usage

findCompatibleOverlaps(query, subject)
countCompatibleOverlaps(query, subject)

Arguments

query

A GAlignments or GAlignmentPairs object representing the aligned reads.

subject

A GRangesList object representing the transcripts.

Details

findCompatibleOverlaps is a specialized version of findOverlaps that uses encodeOverlaps internally to keep only the hits where the junctions in the aligned read are compatible with the splicing of the annotated transcript.

The topic of working with overlap encodings is covered in details in the "OverlapEncodings" vignette located this package (GenomicAlignments) and accessible with vignette("OverlapEncodings").

Value

A Hits object for findCompatibleOverlaps.

An integer vector parallel to (i.e. same length as) query for countCompatibleOverlaps.

Author(s)

Herv<c3><83><c2><a9> Pag<c3><83><c2><a8>s

See Also

  • The findOverlaps generic function defined in the IRanges package.

  • The encodeOverlaps generic function and OverlapEncodings class.

  • The "OverlapEncodings" vignette in this package.

  • GAlignments and GAlignmentPairs objects.

  • GRangesList objects in the GenomicRanges package.

Examples

## Here we only show a simple example illustrating the use of
## countCompatibleOverlaps() on a very small data set. Please
## refer to the "OverlapEncodings" vignette in the GenomicAlignments
## package for a comprehensive presentation of "overlap
## encodings" and related tools/concepts (e.g. "compatible"
## overlaps, "almost compatible" overlaps etc...), and for more
## examples.

## sm_treated1.bam contains a small subset of treated1.bam, a BAM
## file containing single-end reads from the "Pasilla" experiment
## (RNA-seq, Fly, see the pasilla data package for the details)
## and aligned to reference genome BDGP Release 5 (aka dm3 genome on
## the UCSC Genome Browser):
sm_treated1 <- system.file("extdata", "sm_treated1.bam",
                           package="GenomicAlignments", mustWork=TRUE)

## Load the alignments:
flag0 <- scanBamFlag(isDuplicate=FALSE, isNotPassingQualityControls=FALSE)
param0 <- ScanBamParam(flag=flag0)
gal <- readGAlignments(sm_treated1, use.names=TRUE, param=param0)

## Load the transcripts (IMPORTANT: Like always, the reference genome
## of the transcripts must be *exactly* the same as the reference
## genome used to align the reads):
library(TxDb.Dmelanogaster.UCSC.dm3.ensGene)
txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene
exbytx <- exonsBy(txdb, by="tx", use.names=TRUE)

## Number of "compatible" transcripts per alignment in 'gal':
gal_ncomptx <- countCompatibleOverlaps(gal, exbytx)
mcols(gal)$ncomptx <- gal_ncomptx
table(gal_ncomptx)
mean(gal_ncomptx >= 1)
## --> 33% of the alignments in 'gal' are "compatible" with at least
## 1 transcript in 'exbytx'.

## Keep only alignments compatible with at least 1 transcript in
## 'exbytx':
compgal <- gal[gal_ncomptx >= 1]
head(compgal)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(GenomicAlignments)
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: Biostrings
Loading required package: XVector
Loading required package: Rsamtools
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/GenomicAlignments/findCompatibleOverlaps-methods.Rd_%03d_medium.png", width=480, height=480)
> ### Name: findCompatibleOverlaps-methods
> ### Title: Finding hits between reads and transcripts that are _compatible_
> ###   with the splicing of the transcript
> ### Aliases: findCompatibleOverlaps-methods findCompatibleOverlaps
> ###   findCompatibleOverlaps,GAlignments,GRangesList-method
> ###   findCompatibleOverlaps,GAlignmentPairs,GRangesList-method
> ###   countCompatibleOverlaps
> ### Keywords: methods utilities
> 
> ### ** Examples
> 
> ## Here we only show a simple example illustrating the use of
> ## countCompatibleOverlaps() on a very small data set. Please
> ## refer to the "OverlapEncodings" vignette in the GenomicAlignments
> ## package for a comprehensive presentation of "overlap
> ## encodings" and related tools/concepts (e.g. "compatible"
> ## overlaps, "almost compatible" overlaps etc...), and for more
> ## examples.
> 
> ## sm_treated1.bam contains a small subset of treated1.bam, a BAM
> ## file containing single-end reads from the "Pasilla" experiment
> ## (RNA-seq, Fly, see the pasilla data package for the details)
> ## and aligned to reference genome BDGP Release 5 (aka dm3 genome on
> ## the UCSC Genome Browser):
> sm_treated1 <- system.file("extdata", "sm_treated1.bam",
+                            package="GenomicAlignments", mustWork=TRUE)
> 
> ## Load the alignments:
> flag0 <- scanBamFlag(isDuplicate=FALSE, isNotPassingQualityControls=FALSE)
> param0 <- ScanBamParam(flag=flag0)
> gal <- readGAlignments(sm_treated1, use.names=TRUE, param=param0)
> 
> ## Load the transcripts (IMPORTANT: Like always, the reference genome
> ## of the transcripts must be *exactly* the same as the reference
> ## genome used to align the reads):
> library(TxDb.Dmelanogaster.UCSC.dm3.ensGene)
Loading required package: GenomicFeatures
Loading required package: AnnotationDbi
> txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene
> exbytx <- exonsBy(txdb, by="tx", use.names=TRUE)
> 
> ## Number of "compatible" transcripts per alignment in 'gal':
> gal_ncomptx <- countCompatibleOverlaps(gal, exbytx)
> mcols(gal)$ncomptx <- gal_ncomptx
> table(gal_ncomptx)
gal_ncomptx
   0    2    3   11 
1204    2    8  586 
> mean(gal_ncomptx >= 1)
[1] 0.3311111
> ## --> 33% of the alignments in 'gal' are "compatible" with at least
> ## 1 transcript in 'exbytx'.
> 
> ## Keep only alignments compatible with at least 1 transcript in
> ## 'exbytx':
> compgal <- gal[gal_ncomptx >= 1]
> head(compgal)
GAlignments object with 6 alignments and 1 metadata column:
                    seqnames strand       cigar    qwidth     start       end
                       <Rle>  <Rle> <character> <integer> <integer> <integer>
  SRR031722.3117908    chr2L      +         45M        45      7541      7585
  SRR031721.2103139    chr2L      +         45M        45      7908      7952
  SRR031723.2621241    chr2L      -         40M        40      8219      8258
  SRR031719.2892342    chr2L      +         44M        44      8263      8306
  SRR031721.2200896    chr2L      -         45M        45      8267      8311
  SRR031722.5251362    chr2L      -         45M        45      8644      8688
                        width     njunc |   ncomptx
                    <integer> <integer> | <integer>
  SRR031722.3117908        45         0 |         3
  SRR031721.2103139        45         0 |         3
  SRR031723.2621241        40         0 |         2
  SRR031719.2892342        44         0 |         3
  SRR031721.2200896        45         0 |         3
  SRR031722.5251362        45         0 |         2
  -------
  seqinfo: 3 sequences from an unspecified genome
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>