Last data update: 2014.03.03

R: Compute a coverage matrix
cov.matrixR Documentation

Compute a coverage matrix

Description

This method generates a coverage matrix for which each column is a different genomic feature (i.e. the coordinate of the transcription start site) or genomic interval (defined by start/end coordinates) and each row can be either a position or a bin related to this feature or interval. Each element in the matrix is the result of computing the coverage value for this position/bin that is at a certain distance upstream or dowstream the genomic feature (i.e. the TSS) or it is in the genomic interval

Usage

## S4 method for signature 'ANY,missing'
cov.matrix(tr,coordfile,normalization,bin_width,extend,no_windows,offset,num_cores)
## S4 method for signature 'ANY,ANY'
cov.matrix(tr,ctl,coordfile,normalization,bin_width,extend,no_windows,offset,do,num_cores)

Arguments

tr

An instance of a CoverageBamFile or CoverageBigWigFile object

ctl

An instance of a CoverageBamFile or CoverageBigWigFile object. This argument is optional and if provided then this object will be used in the arithmetic operations with the coverages specified with the do argument. Default=NULL

coordfile

File in the BED or GFF format containing the coordinates used to generate the coverage matrix

normalization

Normalize the coverage values in the matrix. Possible normalization types are rpm (Read per million). Default=NULL

bin_width

Calculate the average coverage for bins of length defined by this argument. Default is 1 nucleotide

extend

Controls the number of nucleotides to extend relative to the anchor coordinates (i.e. TSS) as specified in the coordfile argument

no_windows

Number of windows used to divide the interval defined by the start and end coordinates as specified in the coordfile argument

offset

If the no_windows argument is set, this controls the number of windows to extend the interval defined by the start/end coordinates. Default=0

do

Used to set the arithmetic operation to perform in a certain position or bin. Possible values are: log2ratio, ratio, substraction. This argument only makes sense when either a CoverageBamFile or CoverageBigWigFile object is provided with the ctl argument. Default= log2ratio

num_cores

Number of cores to use. Default= The detectCores() function from the library parallel will be used to check the number of cores available in the system and this number will be used as the default value

Details

The cov.matrix function receives one or two CoverageBamFile or CoverageBigWigFile objects and returns a coverage matrix in which every column can be either a single genomic feature (i.e. the coordinates of the transcription start site for a gene) or a certain genomic interval defined by the start and end coordinates specified in the file used with the coordfile argument, and each row represents a different position/bin relative to the feature or a certain window within a genomic interval. This behaviour is controlled by the extend and no_windows parameters. For example, if one set the extend argument, then each row represents a single genomic position (or bin with a a size defined by the bin_width argument) relative to the anchor coordinate. Each matrix element will be the result of calculating the depth of coverage for that particular position/bin relative to the anchor coordinate defined in coordfile. If the bin_width parameter is >1 nt, then the average coverate for each bin is used for all downstream calculations. On the other hand, if the no_windows parameter is set, then each row in the matrix will represent a certain genomic window within an interval defined by the start and end coordinates as specified in the coordfile file. In this case, each element in the matrix will be the average coverage value for that particular genomic window. The number of windows in which the genomic interval is divided is controlled by the no_windows argument. Finally, if only one CoverageBamFile or CoverageBigWigFile is provided using the tr argument, then the matrix elements are the result of computing the coverage value or the average coverage value for a certain position or a certain genomic bin/window respectively. If, on the other hand, two files are provided using the tr and the ctl arguments, then the matrix elements will be the result of performing a certain arithmetic operation (specified with the do argument) with the coverages.

If the normalization parameter is set to 'rpm' then the number of reads aligned into the reference is required to perform the normalization. This number can be provided using the reads_mapped argument of the CoverageBamFile or CoverageBigWigFile objects, or it is automatically calculated if the function is used with a CoverageBamFile object. NOTE: If the reads_mapped argument is set then the automatic calculation will NOT be performed and the provided number will be taken as correct for all downstream calculations

Value

Returns a matrix

Author(s)

Ernesto Lowy <ernestolowy@gmail.com>

References

BAM format specification: http://samtools.sourceforge.net/SAMv1.pdf

WIG format specification: https://genome.ucsc.edu/FAQ/FAQformat.html

BED format specification: https://genome.ucsc.edu/FAQ/FAQformat.html

See Also

draw.profile draw.heatmap

Examples

###########
##BAM files
###########
  
## Generating a coverage matrix with a single BAM file
  
#get a BAM test file
aBAMfile<-system.file("extdata","treat.bam",package="CoverageView")
  
#get a BED file with the TSS (transcription start site) coordinates of some genes
bedfile<-system.file("extdata","testTSS.bed",package="CoverageView")
  
#create the CoverageBamFile object
trm<-CoverageBamFile(aBAMfile,reads_mapped=28564510)
  
#generate the coverage matrix extending 100 nts on each side of the provided TSS
#in the bedfile
DF1=cov.matrix(tr=trm,coordfile=bedfile,extend=100,num_cores=2)
  
#generate the coverage matrix extending 100 nts on each side of the TSS provided
#in the bedfile and normalize the resulting coverages
DF2=cov.matrix(tr=trm,coordfile=bedfile,extend=100,normalization="rpm",num_cores=2)
  
#generate the coverage matrix extending 100 nts on each side of the TSS provided
#in the bedfile and normalize the resulting coverages. This time we calculate the average
#coverage in bins of 2 nucleotides
DF3=cov.matrix(tr=trm,coordfile=bedfile,extend=100,normalization="rpm",bin_width=2,num_cores=2)
  
## Generating a coverage matrix with 2 BAM files
  
#get 2 BAM test files
treatBAMfile<-system.file("extdata","treat.bam",package="CoverageView")
ctrlBAMfile<-system.file("extdata","ctrl.bam",package="CoverageView")
  
#get a BED file with the TSS (transcription start site) coordinates of some genes
bedfile<-system.file("extdata","testTSS.bed",package="CoverageView")
  
#create 2 CoverageBamFile objects
trm<-CoverageBamFile(treatBAMfile,reads_mapped=28564510)
ctl<-CoverageBamFile(ctrlBAMfile,reads_mapped=26713667)
  
#generate the coverage matrix extending 100 nts on each side of the TSS provided
#in the bedfile and normalize the resulting coverages.The matrix elements are obtained
#by computing the ratio of the coverages of the trm against the ctl objects and then
#calculating the log2 of the ratios
DF4=cov.matrix(tr=trm,ctl=ctl,coordfile=bedfile,extend=100,normalization="rpm",do="log2ratio",num_cores=2)
  
#####################
  
#get a treatment BAM test file
treatBAMfile<-system.file("extdata","treat.bam",package="CoverageView")
  
#get a GFF file with the chr,start and end coordinates of different genomic
#features (i.e. CDS)
gffile<-system.file("extdata","test.gff",package="CoverageView")
  
#create the 'treatment' CoverageBamFile object
trm<-CoverageBamFile(treatBAMfile,reads_mapped=28564510)
  
#generate the coverage matrix dividing each genomic interval defined by the start and
#end coordinates in the gff file into 10 windows and calculating the average coverage
#for each window
DF1=cov.matrix(trm,coordfile=gffile,no_windows=10,num_cores=2)
  
#generate the coverage matrix dividing each genomic interval defined by the start and
#end coordinates in the gff file into 10 windows and calculating the average coverage
#for each window, this time we extend the genomic interval by 1 window before the start
#coordinate and 1 window after the end coordinate (offset argument is set to 1)
DF1=cov.matrix(trm,coordfile=gffile,no_windows=10,offset=1)
  
###########
##BigWIG files
###########
  
## Generating a coverage matrix with a single WIG file
  
#get a bigWIG test file
abigWIGfile<-system.file("extdata","treat.bw",package="CoverageView")

#get a BED file with the TSS (transcription start site) coordinates of some genes
bedfile<-system.file("extdata","testTSS.bed",package="CoverageView")

#create the CoverageBigWigFile object
trm<-CoverageBigWigFile(abigWIGfile)
  
#generate the coverage matrix extending 100 nts on each side of the TSS provided
DF1=cov.matrix(trm,coordfile=bedfile,extend=100,bin_width=10,num_cores=2)
  
## Generating a coverage matrix with 2 BigWIG files
  
#get 2 BigWIG test files
treatBigWIGfile<-system.file("extdata","treat.bw",package="CoverageView")
ctrlBigWIGfile<-system.file("extdata","ctrl.bw",package="CoverageView")
  
#create 2 CoverageBigWigFile objects
trm<-CoverageBigWigFile(treatBigWIGfile)
ctl<-CoverageBigWigFile(ctrlBigWIGfile)
  
#generate the coverage matrix extending 100 nts on each side of the TSS provided
#in the bedfile .The matrix elements are obtained by computing the ratio of the 
#coverages of the trm against the ctl objects and then calculating the log2 of the ratios
DF2=cov.matrix(tr=trm,ctl=ctl,coordfile=bedfile,extend=100,do="log2ratio",num_cores=2)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(CoverageView)
Loading required package: Rsamtools
Loading required package: GenomeInfoDb
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: XVector
Loading required package: rtracklayer

Attaching package: 'CoverageView'

The following object is masked from 'package:rtracklayer':

    export.wig

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/CoverageView/cov.matrix.Rd_%03d_medium.png", width=480, height=480)
> ### Name: cov.matrix
> ### Title: Compute a coverage matrix
> ### Aliases: cov.matrix cov.matrix,ANY,missing-method
> ###   cov.matrix,ANY,ANY-method
> 
> ### ** Examples
> 
> ###########
> ##BAM files
> ###########
>   
> ## Generating a coverage matrix with a single BAM file
>   
> #get a BAM test file
> aBAMfile<-system.file("extdata","treat.bam",package="CoverageView")
>   
> #get a BED file with the TSS (transcription start site) coordinates of some genes
> bedfile<-system.file("extdata","testTSS.bed",package="CoverageView")
>   
> #create the CoverageBamFile object
> trm<-CoverageBamFile(aBAMfile,reads_mapped=28564510)
>   
> #generate the coverage matrix extending 100 nts on each side of the provided TSS
> #in the bedfile
> DF1=cov.matrix(tr=trm,coordfile=bedfile,extend=100,num_cores=2)
[INFO] A coverage matrix composed of average coverages will be generated.
[INFO] processing coords:chrI 200 +
[INFO] processing coords:chrI 400 -
>   
> #generate the coverage matrix extending 100 nts on each side of the TSS provided
> #in the bedfile and normalize the resulting coverages
> DF2=cov.matrix(tr=trm,coordfile=bedfile,extend=100,normalization="rpm",num_cores=2)
[INFO] A coverage matrix composed of average coverages will be generated.
[INFO] processing coords:chrI 200 +
[INFO] processing coords:chrI 400 -
>   
> #generate the coverage matrix extending 100 nts on each side of the TSS provided
> #in the bedfile and normalize the resulting coverages. This time we calculate the average
> #coverage in bins of 2 nucleotides
> DF3=cov.matrix(tr=trm,coordfile=bedfile,extend=100,normalization="rpm",bin_width=2,num_cores=2)
[INFO] A coverage matrix composed of average coverages will be generated.
[INFO] processing coords:chrI 200 +
[INFO] processing coords:chrI 400 -
>   
> ## Generating a coverage matrix with 2 BAM files
>   
> #get 2 BAM test files
> treatBAMfile<-system.file("extdata","treat.bam",package="CoverageView")
> ctrlBAMfile<-system.file("extdata","ctrl.bam",package="CoverageView")
>   
> #get a BED file with the TSS (transcription start site) coordinates of some genes
> bedfile<-system.file("extdata","testTSS.bed",package="CoverageView")
>   
> #create 2 CoverageBamFile objects
> trm<-CoverageBamFile(treatBAMfile,reads_mapped=28564510)
> ctl<-CoverageBamFile(ctrlBAMfile,reads_mapped=26713667)
>   
> #generate the coverage matrix extending 100 nts on each side of the TSS provided
> #in the bedfile and normalize the resulting coverages.The matrix elements are obtained
> #by computing the ratio of the coverages of the trm against the ctl objects and then
> #calculating the log2 of the ratios
> DF4=cov.matrix(tr=trm,ctl=ctl,coordfile=bedfile,extend=100,normalization="rpm",do="log2ratio",num_cores=2)
[INFO] A coverage matrix will be generated by calculating the 'log2ratio' of the coverages.
[INFO] processing:/home/ddbj/local/lib64/R/library/CoverageView/extdata/treat.bam
[INFO] processing coords:chrI 200 +
[INFO] processing coords:chrI 400 -
[INFO] processing:/home/ddbj/local/lib64/R/library/CoverageView/extdata/ctrl.bam
[INFO] processing coords:chrI 200 +
[INFO] processing coords:chrI 400 -
>   
> #####################
>   
> #get a treatment BAM test file
> treatBAMfile<-system.file("extdata","treat.bam",package="CoverageView")
>   
> #get a GFF file with the chr,start and end coordinates of different genomic
> #features (i.e. CDS)
> gffile<-system.file("extdata","test.gff",package="CoverageView")
>   
> #create the 'treatment' CoverageBamFile object
> trm<-CoverageBamFile(treatBAMfile,reads_mapped=28564510)
>   
> #generate the coverage matrix dividing each genomic interval defined by the start and
> #end coordinates in the gff file into 10 windows and calculating the average coverage
> #for each window
> DF1=cov.matrix(trm,coordfile=gffile,no_windows=10,num_cores=2)
[INFO] A coverage matrix composed of average coverages will be generated.
Warning in readGFF(filepath, version = version, filter = filter) :
  connection is not positioned at the start of the file, rewinding it
[INFO] processing coords:chrI 300 400
[INFO] processing coords:chrI 100 200
>   
> #generate the coverage matrix dividing each genomic interval defined by the start and
> #end coordinates in the gff file into 10 windows and calculating the average coverage
> #for each window, this time we extend the genomic interval by 1 window before the start
> #coordinate and 1 window after the end coordinate (offset argument is set to 1)
> DF1=cov.matrix(trm,coordfile=gffile,no_windows=10,offset=1)
[INFO] A coverage matrix composed of average coverages will be generated.
Warning in readGFF(filepath, version = version, filter = filter) :
  connection is not positioned at the start of the file, rewinding it
[INFO] processing coords:chrI 100 200
[INFO] processing coords:chrI 300 400
>   
> ###########
> ##BigWIG files
> ###########
>   
> ## Generating a coverage matrix with a single WIG file
>   
> #get a bigWIG test file
> abigWIGfile<-system.file("extdata","treat.bw",package="CoverageView")
> 
> #get a BED file with the TSS (transcription start site) coordinates of some genes
> bedfile<-system.file("extdata","testTSS.bed",package="CoverageView")
> 
> #create the CoverageBigWigFile object
> trm<-CoverageBigWigFile(abigWIGfile)
>   
> #generate the coverage matrix extending 100 nts on each side of the TSS provided
> DF1=cov.matrix(trm,coordfile=bedfile,extend=100,bin_width=10,num_cores=2)
[INFO] A coverage matrix composed of average coverages will be generated.
[INFO] processing coords:chrI 200 +
[INFO] processing coords:chrI 400 -
>   
> ## Generating a coverage matrix with 2 BigWIG files
>   
> #get 2 BigWIG test files
> treatBigWIGfile<-system.file("extdata","treat.bw",package="CoverageView")
> ctrlBigWIGfile<-system.file("extdata","ctrl.bw",package="CoverageView")
>   
> #create 2 CoverageBigWigFile objects
> trm<-CoverageBigWigFile(treatBigWIGfile)
> ctl<-CoverageBigWigFile(ctrlBigWIGfile)
>   
> #generate the coverage matrix extending 100 nts on each side of the TSS provided
> #in the bedfile .The matrix elements are obtained by computing the ratio of the 
> #coverages of the trm against the ctl objects and then calculating the log2 of the ratios
> DF2=cov.matrix(tr=trm,ctl=ctl,coordfile=bedfile,extend=100,do="log2ratio",num_cores=2)
[INFO] A coverage matrix will be generated by calculating the 'log2ratio' of the coverages.
[INFO] processing:/home/ddbj/local/lib64/R/library/CoverageView/extdata/treat.bw
[INFO] processing coords:chrI 200 +
[INFO] processing coords:chrI 400 -
[INFO] processing:/home/ddbj/local/lib64/R/library/CoverageView/extdata/ctrl.bw
[INFO] processing coords:chrI 200 +
[INFO] processing coords:chrI 400 -
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>