Computes read coverage along single and multi component features based on
genomic alignments. The coverage segments of component features are spliced to
continuous ranges, such as exons to transcripts or CDSs to ORFs. The results
can be obtained with single nucleotide resolution (e.g. around start and stop
codons) or as mean coverage of relative bin sizes, such as 100 bins for each
feature. The latter allows comparisons of coverage trends among transcripts of
variable length. The results can be obtained for single or many features (e.g.
any number of transcritpts) at once. Visualization of the coverage results is
facilitated by a downstream plotfeatureCoverage function.
Paths to BAM files provided as BamFileList object. The name slot of the
BAM files will be used for naming samples in the results.
grl
Genomic ranges provided as GRangesList typically generated form
txdb instances with operations like: cdsBy(txdb, "tx") or
exonsBy(txdb, "tx"). Single component features will be processed the
same way as multi component features.
resizereads
Positive integer defining the length alignments should be resized to prior to the
coverage calculation. NULL will omit the resizing step.
readlengthrange
Positive integer of length 2 determining the read length range to use for
the coverage calculation. Reads falling outside of the specified length range
will be excluded from the coverage calculation. For instance,
readlengthrange=c(30:40) will base the coverage calculation on reads
between 30 to 40 bps. Assigning NULL will skip this filtering step.
Nbins
Single positive integer defining the number of segments the coverage of each
feature should be binned into in order to obtain coverage summaries of constant
length, e.g. for plotting purposes.
method
Defines the summary statistics to use for binning. The default is method=mean.
fixedmatrix
If set to TRUE, a coverage matrix with single nucleotide resolution will be
returned for any number of transcripts centered around precise anchor points in
a genome annotation, such a stop/start codons or transcription start sites. For instance, a
matrix with coverage information 20bps upstream and downstream of the stop/start codons can be
obtained with fixedmatrix=TRUE, upstream=20, downstream=20 along with a grl
instance containing the CDS exon ranges required for this operation, e.g. generated
with cdsBy(txdb, "tx").
resizefeatures
Needs to be set to TRUE when fixedmatrix=TRUE. Internally, this will use
the systemPipeR::.resizeFeature function to extend single and multi component
features at their most left and most right end coordinates. The corresponding extension
values are specified under the upstream and downstream arguments.
upstream
Single positive integer specifying the upstream extension length relative to the orientation
of each feature in the genome. More details are given above.
downstream
Single positive integer specifying the downstream extension length relative to the orientation
of each feature in the genome. More details are given above.
outfile
Default NULL omits writing of the results to a file. If a file name is specified then
the results are written to a tabular file. If bfl contains the paths to several BAM files
then the results will be appended to the same file where the first column specifies the
sample labels. Redirecting the results to file is particularly useful when processing large
files of many sample where computation times can be significant.
overwrite
If set to TRUE any existing file assigned to outfile will be overwritten.
Value
The function allows to return the following four distinct outputs. The settings to return
these instances are illustrated below in the example section.
(A)
data.frame containing binned coverage where rows are features and columns
coverage bins. The first four columns contain (i) the sample names, (ii) the number of
total aligned reads in the corresponding BAM files (useful for normalization), (iii) the
feature IDs, (iv) strand of the coverage. All following columns are numeric and contain
the actual coverage data for the sense and antisense strand of each feature.
(B)
data.frame containing coverage with single nucleotide
resolution around anchor points such as start and stop codons. The two matrix
components are appended column-wise. To clearly distinguish the two data components,
they are separated by a specialty column containing pipe characters. The first four
columns are the same as described under (A). The column title for the
anchor point is 0. For instance, if the features are CDSs then the first 0 corresponds to the
first nucleotide of the start codon and the second 0 to the last nucleotide of
the stop codon. Upstream and downstream positions are indicated by negative and
positive column numbers, respectively.
(C)
data.frame containing combined results of (A) and (B) where the
first set of columns contains to the coverage around the start codons, the second one
the binned coverage of the CDSs and the third one the coverage around the stop codons
separated by the same pipe columns mentioned under (B).
(D)
Rle list containing the nucleotide level coverage of each feature
Author(s)
Thomas Girke
See Also
plotfeatureCoverage
Examples
## Construct SYSargs object from param and targets files
param <- system.file("extdata", "tophat.param", package="systemPipeR")
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
args <- systemArgs(sysma=param, mytargets=targets)
## Not run:
## Features from sample data of systemPipeRdata package
library(GenomicFeatures)
file <- system.file("extdata/annotation", "tair10.gff", package="systemPipeRdata")
txdb <- makeTxDbFromGFF(file=file, format="gff3", organism="Arabidopsis")
## (A) Generate binned coverage for two BAM files and 4 transcripts
grl <- cdsBy(txdb, "tx", use.names=TRUE)
fcov <- featureCoverage(bfl=BamFileList(outpaths(args)[1:2]), grl=grl[1:4], resizereads=NULL,
readlengthrange=NULL, Nbins=20, method=mean, fixedmatrix=FALSE,
resizefeatures=TRUE, upstream=20, downstream=20)
plotfeatureCoverage(covMA=fcov, method=mean, scales="fixed", scale_count_val=10^6)
## (B) Coverage matrix upstream and downstream of start/stop codons
fcov <- featureCoverage(bfl=BamFileList(outpaths(args)[1:2]), grl=grl[1:4], resizereads=NULL,
readlengthrange=NULL, Nbins=NULL, method=mean, fixedmatrix=TRUE,
resizefeatures=TRUE, upstream=20, downstream=20)
plotfeatureCoverage(covMA=fcov, method=mean, scales="fixed", scale_count_val=10^6)
## (C) Combined matrix for both binned and start/stop codon
fcov <- featureCoverage(bfl=BamFileList(outpaths(args)[1:2]), grl=grl[1:4], resizereads=NULL,
readlengthrange=NULL, Nbins=20, method=mean, fixedmatrix=TRUE,
resizefeatures=TRUE, upstream=20, downstream=20)
plotfeatureCoverage(covMA=fcov, method=mean, scales="fixed", scale_count_val=10^6)
## (D) Rle coverage objects one for each query feature
fcov <- featureCoverage(bfl=BamFileList(outpaths(args)[1:2]), grl=grl[1:4], resizereads=NULL,
readlengthrange=NULL, Nbins=NULL, method=mean, fixedmatrix=FALSE,
resizefeatures=TRUE, upstream=20, downstream=20)
## End(Not run)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(systemPipeR)
Loading required package: Rsamtools
Loading required package: GenomeInfoDb
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: XVector
Loading required package: ShortRead
Loading required package: BiocParallel
Loading required package: GenomicAlignments
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/systemPipeR/featureCoverage.Rd_%03d_medium.png", width=480, height=480)
> ### Name: featureCoverage
> ### Title: Genome read coverage by transcript models
> ### Aliases: featureCoverage
> ### Keywords: utilities
>
> ### ** Examples
>
> ## Construct SYSargs object from param and targets files
> param <- system.file("extdata", "tophat.param", package="systemPipeR")
> targets <- system.file("extdata", "targets.txt", package="systemPipeR")
> args <- systemArgs(sysma=param, mytargets=targets)
There were 18 warnings (use warnings() to see them)
>
> ## Not run:
> ##D ## Features from sample data of systemPipeRdata package
> ##D library(GenomicFeatures)
> ##D file <- system.file("extdata/annotation", "tair10.gff", package="systemPipeRdata")
> ##D txdb <- makeTxDbFromGFF(file=file, format="gff3", organism="Arabidopsis")
> ##D
> ##D ## (A) Generate binned coverage for two BAM files and 4 transcripts
> ##D grl <- cdsBy(txdb, "tx", use.names=TRUE)
> ##D fcov <- featureCoverage(bfl=BamFileList(outpaths(args)[1:2]), grl=grl[1:4], resizereads=NULL,
> ##D readlengthrange=NULL, Nbins=20, method=mean, fixedmatrix=FALSE,
> ##D resizefeatures=TRUE, upstream=20, downstream=20)
> ##D plotfeatureCoverage(covMA=fcov, method=mean, scales="fixed", scale_count_val=10^6)
> ##D
> ##D ## (B) Coverage matrix upstream and downstream of start/stop codons
> ##D fcov <- featureCoverage(bfl=BamFileList(outpaths(args)[1:2]), grl=grl[1:4], resizereads=NULL,
> ##D readlengthrange=NULL, Nbins=NULL, method=mean, fixedmatrix=TRUE,
> ##D resizefeatures=TRUE, upstream=20, downstream=20)
> ##D plotfeatureCoverage(covMA=fcov, method=mean, scales="fixed", scale_count_val=10^6)
> ##D
> ##D ## (C) Combined matrix for both binned and start/stop codon
> ##D fcov <- featureCoverage(bfl=BamFileList(outpaths(args)[1:2]), grl=grl[1:4], resizereads=NULL,
> ##D readlengthrange=NULL, Nbins=20, method=mean, fixedmatrix=TRUE,
> ##D resizefeatures=TRUE, upstream=20, downstream=20)
> ##D plotfeatureCoverage(covMA=fcov, method=mean, scales="fixed", scale_count_val=10^6)
> ##D
> ##D ## (D) Rle coverage objects one for each query feature
> ##D fcov <- featureCoverage(bfl=BamFileList(outpaths(args)[1:2]), grl=grl[1:4], resizereads=NULL,
> ##D readlengthrange=NULL, Nbins=NULL, method=mean, fixedmatrix=FALSE,
> ##D resizefeatures=TRUE, upstream=20, downstream=20)
> ##D
> ## End(Not run)
>
>
>
>
>
> dev.off()
null device
1
>