Last data update: 2014.03.03

R: Genome read coverage by transcript models
featureCoverageR Documentation

Genome read coverage by transcript models

Description

Computes read coverage along single and multi component features based on genomic alignments. The coverage segments of component features are spliced to continuous ranges, such as exons to transcripts or CDSs to ORFs. The results can be obtained with single nucleotide resolution (e.g. around start and stop codons) or as mean coverage of relative bin sizes, such as 100 bins for each feature. The latter allows comparisons of coverage trends among transcripts of variable length. The results can be obtained for single or many features (e.g. any number of transcritpts) at once. Visualization of the coverage results is facilitated by a downstream plotfeatureCoverage function.

Usage

featureCoverage(bfl, grl, resizereads = NULL, readlengthrange = NULL, Nbins = 20, 
                method = mean, fixedmatrix, resizefeatures, upstream, downstream, 
                outfile, overwrite = FALSE)

Arguments

bfl

Paths to BAM files provided as BamFileList object. The name slot of the BAM files will be used for naming samples in the results.

grl

Genomic ranges provided as GRangesList typically generated form txdb instances with operations like: cdsBy(txdb, "tx") or exonsBy(txdb, "tx"). Single component features will be processed the same way as multi component features.

resizereads

Positive integer defining the length alignments should be resized to prior to the coverage calculation. NULL will omit the resizing step.

readlengthrange

Positive integer of length 2 determining the read length range to use for the coverage calculation. Reads falling outside of the specified length range will be excluded from the coverage calculation. For instance, readlengthrange=c(30:40) will base the coverage calculation on reads between 30 to 40 bps. Assigning NULL will skip this filtering step.

Nbins

Single positive integer defining the number of segments the coverage of each feature should be binned into in order to obtain coverage summaries of constant length, e.g. for plotting purposes.

method

Defines the summary statistics to use for binning. The default is method=mean.

fixedmatrix

If set to TRUE, a coverage matrix with single nucleotide resolution will be returned for any number of transcripts centered around precise anchor points in a genome annotation, such a stop/start codons or transcription start sites. For instance, a matrix with coverage information 20bps upstream and downstream of the stop/start codons can be obtained with fixedmatrix=TRUE, upstream=20, downstream=20 along with a grl instance containing the CDS exon ranges required for this operation, e.g. generated with cdsBy(txdb, "tx").

resizefeatures

Needs to be set to TRUE when fixedmatrix=TRUE. Internally, this will use the systemPipeR::.resizeFeature function to extend single and multi component features at their most left and most right end coordinates. The corresponding extension values are specified under the upstream and downstream arguments.

upstream

Single positive integer specifying the upstream extension length relative to the orientation of each feature in the genome. More details are given above.

downstream

Single positive integer specifying the downstream extension length relative to the orientation of each feature in the genome. More details are given above.

outfile

Default NULL omits writing of the results to a file. If a file name is specified then the results are written to a tabular file. If bfl contains the paths to several BAM files then the results will be appended to the same file where the first column specifies the sample labels. Redirecting the results to file is particularly useful when processing large files of many sample where computation times can be significant.

overwrite

If set to TRUE any existing file assigned to outfile will be overwritten.

Value

The function allows to return the following four distinct outputs. The settings to return these instances are illustrated below in the example section.

(A)

data.frame containing binned coverage where rows are features and columns coverage bins. The first four columns contain (i) the sample names, (ii) the number of total aligned reads in the corresponding BAM files (useful for normalization), (iii) the feature IDs, (iv) strand of the coverage. All following columns are numeric and contain the actual coverage data for the sense and antisense strand of each feature.

(B)

data.frame containing coverage with single nucleotide resolution around anchor points such as start and stop codons. The two matrix components are appended column-wise. To clearly distinguish the two data components, they are separated by a specialty column containing pipe characters. The first four columns are the same as described under (A). The column title for the anchor point is 0. For instance, if the features are CDSs then the first 0 corresponds to the first nucleotide of the start codon and the second 0 to the last nucleotide of the stop codon. Upstream and downstream positions are indicated by negative and positive column numbers, respectively.

(C)

data.frame containing combined results of (A) and (B) where the first set of columns contains to the coverage around the start codons, the second one the binned coverage of the CDSs and the third one the coverage around the stop codons separated by the same pipe columns mentioned under (B).

(D)

Rle list containing the nucleotide level coverage of each feature

Author(s)

Thomas Girke

See Also

plotfeatureCoverage

Examples

## Construct SYSargs object from param and targets files 
param <- system.file("extdata", "tophat.param", package="systemPipeR")
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
args <- systemArgs(sysma=param, mytargets=targets)

## Not run: 
## Features from sample data of systemPipeRdata package
library(GenomicFeatures)
file <- system.file("extdata/annotation", "tair10.gff", package="systemPipeRdata")
txdb <- makeTxDbFromGFF(file=file, format="gff3", organism="Arabidopsis")

## (A) Generate binned coverage for two BAM files and 4 transcripts
grl <- cdsBy(txdb, "tx", use.names=TRUE)
fcov <- featureCoverage(bfl=BamFileList(outpaths(args)[1:2]), grl=grl[1:4], resizereads=NULL, 
                    readlengthrange=NULL, Nbins=20, method=mean, fixedmatrix=FALSE, 
                    resizefeatures=TRUE, upstream=20, downstream=20)
plotfeatureCoverage(covMA=fcov, method=mean, scales="fixed", scale_count_val=10^6)

## (B) Coverage matrix upstream and downstream of start/stop codons
fcov <- featureCoverage(bfl=BamFileList(outpaths(args)[1:2]), grl=grl[1:4], resizereads=NULL, 
                    readlengthrange=NULL, Nbins=NULL, method=mean, fixedmatrix=TRUE, 
                    resizefeatures=TRUE, upstream=20, downstream=20)
plotfeatureCoverage(covMA=fcov, method=mean, scales="fixed", scale_count_val=10^6)

## (C) Combined matrix for both binned and start/stop codon 
fcov <- featureCoverage(bfl=BamFileList(outpaths(args)[1:2]), grl=grl[1:4], resizereads=NULL, 
                    readlengthrange=NULL, Nbins=20, method=mean, fixedmatrix=TRUE, 
                    resizefeatures=TRUE, upstream=20, downstream=20)
plotfeatureCoverage(covMA=fcov, method=mean, scales="fixed", scale_count_val=10^6)

## (D) Rle coverage objects one for each query feature
fcov <- featureCoverage(bfl=BamFileList(outpaths(args)[1:2]), grl=grl[1:4], resizereads=NULL, 
                    readlengthrange=NULL, Nbins=NULL, method=mean, fixedmatrix=FALSE, 
                    resizefeatures=TRUE, upstream=20, downstream=20)


## End(Not run)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(systemPipeR)
Loading required package: Rsamtools
Loading required package: GenomeInfoDb
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: XVector
Loading required package: ShortRead
Loading required package: BiocParallel
Loading required package: GenomicAlignments
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.


> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/systemPipeR/featureCoverage.Rd_%03d_medium.png", width=480, height=480)
> ### Name: featureCoverage
> ### Title: Genome read coverage by transcript models
> ### Aliases: featureCoverage
> ### Keywords: utilities
> 
> ### ** Examples
> 
> ## Construct SYSargs object from param and targets files 
> param <- system.file("extdata", "tophat.param", package="systemPipeR")
> targets <- system.file("extdata", "targets.txt", package="systemPipeR")
> args <- systemArgs(sysma=param, mytargets=targets)
There were 18 warnings (use warnings() to see them)
> 
> ## Not run: 
> ##D ## Features from sample data of systemPipeRdata package
> ##D library(GenomicFeatures)
> ##D file <- system.file("extdata/annotation", "tair10.gff", package="systemPipeRdata")
> ##D txdb <- makeTxDbFromGFF(file=file, format="gff3", organism="Arabidopsis")
> ##D 
> ##D ## (A) Generate binned coverage for two BAM files and 4 transcripts
> ##D grl <- cdsBy(txdb, "tx", use.names=TRUE)
> ##D fcov <- featureCoverage(bfl=BamFileList(outpaths(args)[1:2]), grl=grl[1:4], resizereads=NULL, 
> ##D                     readlengthrange=NULL, Nbins=20, method=mean, fixedmatrix=FALSE, 
> ##D                     resizefeatures=TRUE, upstream=20, downstream=20)
> ##D plotfeatureCoverage(covMA=fcov, method=mean, scales="fixed", scale_count_val=10^6)
> ##D 
> ##D ## (B) Coverage matrix upstream and downstream of start/stop codons
> ##D fcov <- featureCoverage(bfl=BamFileList(outpaths(args)[1:2]), grl=grl[1:4], resizereads=NULL, 
> ##D                     readlengthrange=NULL, Nbins=NULL, method=mean, fixedmatrix=TRUE, 
> ##D                     resizefeatures=TRUE, upstream=20, downstream=20)
> ##D plotfeatureCoverage(covMA=fcov, method=mean, scales="fixed", scale_count_val=10^6)
> ##D 
> ##D ## (C) Combined matrix for both binned and start/stop codon 
> ##D fcov <- featureCoverage(bfl=BamFileList(outpaths(args)[1:2]), grl=grl[1:4], resizereads=NULL, 
> ##D                     readlengthrange=NULL, Nbins=20, method=mean, fixedmatrix=TRUE, 
> ##D                     resizefeatures=TRUE, upstream=20, downstream=20)
> ##D plotfeatureCoverage(covMA=fcov, method=mean, scales="fixed", scale_count_val=10^6)
> ##D 
> ##D ## (D) Rle coverage objects one for each query feature
> ##D fcov <- featureCoverage(bfl=BamFileList(outpaths(args)[1:2]), grl=grl[1:4], resizereads=NULL, 
> ##D                     readlengthrange=NULL, Nbins=NULL, method=mean, fixedmatrix=FALSE, 
> ##D                     resizefeatures=TRUE, upstream=20, downstream=20)
> ##D 
> ## End(Not run)
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>