R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Compute coverage by transcript (or CDS) of a set of ranges

coverageByTranscript

R Documentation

Compute coverage by transcript (or CDS) of a set of ranges

Description

coverageByTranscript computes the transcript (or CDS) coverage of a set of ranges.

pcoverageByTranscript is a version of coverageByTranscript that operates element-wise.

Usage

coverageByTranscript(x, transcripts, ignore.strand=FALSE)

pcoverageByTranscript(x, transcripts, ignore.strand=FALSE, ...)

Arguments

`x`	An object representing a set of ranges (typically aligned reads). GRanges, GRangesList, GAlignments, GAlignmentPairs, and GAlignmentsList objects are supported. More generally, for `coverageByTranscript` `x` can be any object for which `seqinfo()` and `coverage()` are supported (e.g. a BamFile object). Note that, for such objects, `coverage()` is expected to return an RleList object whose names are `seqlevels(x)`). More generally, for `pcoverageByTranscript` `x` can be any object for which `grglist()` is supported. It should have the length of `transcripts` or length 1. If the latter, it is recycled to the length of `transcripts`.
`transcripts`	A GRangesList object representing the exons of each transcript for which to compute coverage. For each transcript, the exons must be ordered by ascending rank, that is, by their position in the transcript. This means that, for a transcript located on the minus strand, the exons should typically be ordered by descending position on the reference genome. If `transcripts` was obtained with `exonsBy`, then the exons are guaranteed to be ordered by ascending rank. See `?exonsBy` for more information. Alternatively `transcripts` can be any object for which `exonsBy` is implemented (e.g. a TxDb object), in which case it is replaced by the GRangesList object returned by `exonsBy(transcripts, by="tx", use.names=TRUE)`. For `pcoverageByTranscript`, `transcripts` should have the length of `x` or length 1. If the latter, it is recycled to the length of `x`).
`ignore.strand`	TRUE or FALSE. If FALSE (the default) then the strand of a range in `x` and exon in `transcripts` must be the same in order for the range to contribute coverage to the exon. If TRUE then the strand is ignored.
`...`	Additional arguments passed to the internal call to `grglist()`. More precisely, when `x` is not a GRanges or GRangesList object, `pcoverageByTranscript` replace it by the GRangesList object returned by `grglist(x, ...)`.

Value

An RleList object parallel to transcripts, that is, the i-th element in it is an integer-Rle representing the coverage of the i-th transcript in transcripts. Its elementNROWS() is guaranteed to be identical to sum(width(transcripts)). The names and metadata columns on transcripts are propagated to it.

Author(s)

Herv<c3><83><c2><a9> Pag<c3><83><c2><a8>s

Examples

## ---------------------------------------------------------------------
## 1. COMPUTE TRANSCRIPTOME COVERAGE OF A SET OF ALIGNED READS
## ---------------------------------------------------------------------

## Load the aligned reads:
library(pasillaBamSubset)
library(GenomicAlignments)
reads <- readGAlignments(untreated1_chr4())

## Load the transcripts:
library(TxDb.Dmelanogaster.UCSC.dm3.ensGene)
txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene
transcripts <- exonsBy(txdb, by="tx", use.names=TRUE)

## Compute the transcript coverage with coverageByTranscript():
tx_cvg <- coverageByTranscript(reads, transcripts, ignore.strand=TRUE)
tx_cvg

## A sanity check:
stopifnot(identical(elementNROWS(tx_cvg), sum(width(transcripts))))

## We can also use pcoverageByTranscript() to compute 'tx_cvg'.
## For this we first create a GAlignmentsList object "parallel" to
## 'transcripts' where the i-th list element contains the aligned reads
## that overlap with the i-th transcript:
hits <- findOverlaps(reads, transcripts, ignore.strand=TRUE)
tx2reads <- setNames(as(t(hits), "List"), names(transcripts))
reads_by_tx <- extractList(reads, tx2reads)  # GAlignmentsList object
reads_by_tx

## Call pcoverageByTranscript():
tx_cvg2 <- pcoverageByTranscript(reads_by_tx, transcripts,
                                 ignore.strand=TRUE)
stopifnot(identical(tx_cvg, tx_cvg2))

## A more meaningful coverage is obtained by counting for each
## transcript only the reads that are *compatible* with its splicing:
compat_hits <- findCompatibleOverlaps(reads, transcripts)
tx2reads <- setNames(as(t(compat_hits), "List"), names(transcripts))
compat_reads_by_tx <- extractList(reads, tx2reads)

tx_compat_cvg <- pcoverageByTranscript(compat_reads_by_tx,
                                       transcripts,
                                       ignore.strand=TRUE)
## A sanity check:
stopifnot(all(all(tx_compat_cvg <= tx_cvg)))

## ---------------------------------------------------------------------
## 2. COMPUTE CDS COVERAGE OF A SET OF ALIGNED READS
## ---------------------------------------------------------------------

## coverageByTranscript() can also be used to compute CDS coverage:
cds <- cdsBy(txdb, by="tx", use.names=TRUE)
cds_cvg <- coverageByTranscript(reads, cds, ignore.strand=TRUE)
cds_cvg

## A sanity check:
stopifnot(identical(elementNROWS(cds_cvg), sum(width(cds))))

## ---------------------------------------------------------------------
## 3. ALTERNATIVELY, THE CDS COVERAGE CAN BE OBTAINED FROM THE
##    TRANSCRIPT COVERAGE BY TRIMMING THE 5' AND 3' UTRS
## ---------------------------------------------------------------------

tx_lens <- transcriptLengths(txdb, with.utr5_len=TRUE, with.utr3_len=TRUE)
stopifnot(identical(tx_lens$tx_name, names(tx_cvg)))  # sanity

## Keep the rows in 'tx_lens' that correspond to a list element in
## 'cds_cvg' and put them in the same order as in 'cds_cvg':
m <- match(names(cds_cvg), names(tx_cvg))
tx_lens <- tx_lens[m, ]
utr5_width <- tx_lens$utr5_len
utr3_width <- tx_lens$utr3_len

trimListElements <- function(x, ltrim=0, rtrim=0)
{
    x_eltNROWS <- elementNROWS(x)
    n1 <- pmax(x_eltNROWS - rtrim, 0)
    n2 <- pmax(n1 - ltrim, 0)
    ptail(phead(x, n=n1), n=n2)
}

cds_cvg2 <- trimListElements(tx_cvg[m], utr5_width, utr3_width)

## A sanity check:
stopifnot(identical(cds_cvg2, cds_cvg))

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(GenomicFeatures)
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: AnnotationDbi
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/GenomicFeatures/coverageByTranscript.Rd_%03d_medium.png", width=480, height=480)
> ### Name: coverageByTranscript
> ### Title: Compute coverage by transcript (or CDS) of a set of ranges
> ### Aliases: coverageByTranscript pcoverageByTranscript
> ### Keywords: manip
> 
> ### ** Examples
> 
> ## ---------------------------------------------------------------------
> ## 1. COMPUTE TRANSCRIPTOME COVERAGE OF A SET OF ALIGNED READS
> ## ---------------------------------------------------------------------
> 
> ## Load the aligned reads:
> library(pasillaBamSubset)
> library(GenomicAlignments)
Loading required package: SummarizedExperiment
Loading required package: Biostrings
Loading required package: XVector
Loading required package: Rsamtools
> reads <- readGAlignments(untreated1_chr4())
> 
> ## Load the transcripts:
> library(TxDb.Dmelanogaster.UCSC.dm3.ensGene)
> txdb <- TxDb.Dmelanogaster.UCSC.dm3.ensGene
> transcripts <- exonsBy(txdb, by="tx", use.names=TRUE)
> 
> ## Compute the transcript coverage with coverageByTranscript():
> tx_cvg <- coverageByTranscript(reads, transcripts, ignore.strand=TRUE)
> tx_cvg
RleList of length 29173
$FBtr0300689
integer-Rle of length 1880 with 1 run
  Lengths: 1880
  Values :    0

$FBtr0300690
integer-Rle of length 1802 with 1 run
  Lengths: 1802
  Values :    0

$FBtr0330654
integer-Rle of length 1844 with 1 run
  Lengths: 1844
  Values :    0

$FBtr0309810
integer-Rle of length 2230 with 1 run
  Lengths: 2230
  Values :    0

$FBtr0306539
integer-Rle of length 4051 with 1 run
  Lengths: 4051
  Values :    0

...
<29168 more elements>
> 
> ## A sanity check:
> stopifnot(identical(elementNROWS(tx_cvg), sum(width(transcripts))))
> 
> ## We can also use pcoverageByTranscript() to compute 'tx_cvg'.
> ## For this we first create a GAlignmentsList object "parallel" to
> ## 'transcripts' where the i-th list element contains the aligned reads
> ## that overlap with the i-th transcript:
> hits <- findOverlaps(reads, transcripts, ignore.strand=TRUE)
> tx2reads <- setNames(as(t(hits), "List"), names(transcripts))
> reads_by_tx <- extractList(reads, tx2reads)  # GAlignmentsList object
> reads_by_tx
GAlignmentsList object of length 29173:
$FBtr0300689 
GAlignments object with 0 alignments and 0 metadata columns:
     seqnames strand cigar qwidth start end width njunc

$FBtr0300690 
GAlignments object with 0 alignments and 0 metadata columns:
     seqnames strand cigar qwidth start end width njunc

$FBtr0330654 
GAlignments object with 0 alignments and 0 metadata columns:
     seqnames strand cigar qwidth start end width njunc

...
<29170 more elements>
-------
seqinfo: 8 sequences from an unspecified genome
> 
> ## Call pcoverageByTranscript():
> tx_cvg2 <- pcoverageByTranscript(reads_by_tx, transcripts,
+                                  ignore.strand=TRUE)
> stopifnot(identical(tx_cvg, tx_cvg2))
> 
> ## A more meaningful coverage is obtained by counting for each
> ## transcript only the reads that are *compatible* with its splicing:
> compat_hits <- findCompatibleOverlaps(reads, transcripts)
> tx2reads <- setNames(as(t(compat_hits), "List"), names(transcripts))
> compat_reads_by_tx <- extractList(reads, tx2reads)
> 
> tx_compat_cvg <- pcoverageByTranscript(compat_reads_by_tx,
+                                        transcripts,
+                                        ignore.strand=TRUE)
> ## A sanity check:
> stopifnot(all(all(tx_compat_cvg <= tx_cvg)))
> 
> ## ---------------------------------------------------------------------
> ## 2. COMPUTE CDS COVERAGE OF A SET OF ALIGNED READS
> ## ---------------------------------------------------------------------
> 
> ## coverageByTranscript() can also be used to compute CDS coverage:
> cds <- cdsBy(txdb, by="tx", use.names=TRUE)
> cds_cvg <- coverageByTranscript(reads, cds, ignore.strand=TRUE)
> cds_cvg
RleList of length 26950
$FBtr0300689
integer-Rle of length 855 with 1 run
  Lengths: 855
  Values :   0

$FBtr0300690
integer-Rle of length 1443 with 1 run
  Lengths: 1443
  Values :    0

$FBtr0330654
integer-Rle of length 819 with 1 run
  Lengths: 819
  Values :   0

$FBtr0306539
integer-Rle of length 3024 with 1 run
  Lengths: 3024
  Values :    0

$FBtr0306536
integer-Rle of length 3024 with 1 run
  Lengths: 3024
  Values :    0

...
<26945 more elements>
> 
> ## A sanity check:
> stopifnot(identical(elementNROWS(cds_cvg), sum(width(cds))))
> 
> ## ---------------------------------------------------------------------
> ## 3. ALTERNATIVELY, THE CDS COVERAGE CAN BE OBTAINED FROM THE
> ##    TRANSCRIPT COVERAGE BY TRIMMING THE 5' AND 3' UTRS
> ## ---------------------------------------------------------------------
> 
> tx_lens <- transcriptLengths(txdb, with.utr5_len=TRUE, with.utr3_len=TRUE)
> stopifnot(identical(tx_lens$tx_name, names(tx_cvg)))  # sanity
> 
> ## Keep the rows in 'tx_lens' that correspond to a list element in
> ## 'cds_cvg' and put them in the same order as in 'cds_cvg':
> m <- match(names(cds_cvg), names(tx_cvg))
> tx_lens <- tx_lens[m, ]
> utr5_width <- tx_lens$utr5_len
> utr3_width <- tx_lens$utr3_len
> 
> trimListElements <- function(x, ltrim=0, rtrim=0)
+ {
+     x_eltNROWS <- elementNROWS(x)
+     n1 <- pmax(x_eltNROWS - rtrim, 0)
+     n2 <- pmax(n1 - ltrim, 0)
+     ptail(phead(x, n=n1), n=n2)
+ }
> 
> cds_cvg2 <- trimListElements(tx_cvg[m], utr5_width, utr3_width)
> 
> ## A sanity check:
> stopifnot(identical(cds_cvg2, cds_cvg))
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>