Last data update: 2014.03.03

R: Check bimodality of regions
checkBimodalityR Documentation

Check bimodality of regions

Description

Compute the maximum bimodality score across all base pairs in each region.

Usage

checkBimodality(bam.files, regions, width=100, param=readParam(), 
    prior.count=2, invert=FALSE) 

Arguments

bam.files

a character vector containing paths to sorted and indexed BAM files

regions

a GRanges object specifying the regions over which bimodality is to be calculated

width

an integer scalar or vector indicating the span with which to compute bimodality

param

a readParam object containing read extraction parameters

prior.count

a numeric scalar specifying the prior count to compute bimodality scores

invert

a logical scalar indicating whether bimodality score should be inverted

Details

Consider a base position x. This function counts the number of forward- and reverse-strand reads within the interval [x-width+1, x]. It then calculates the forward:reverse ratio after adding prior.count to both counts. This is repeated for the interval [x, x+width-1], and the reverse:forward ratio is then computed. The smaller of these two ratios is used as the bimodality score.

Sites with high bimodality scores will be enriched for forward- and reverse-strand enrichment on the left and right of the site, respectively. Given a genomic region, this function will treat each base position as a site. The largest bimodality score across all positions will be reported for each region. The idea is to assist with the identification of transcription factor binding sites, which exhibit strong strand bimodality. The function will be less useful for broad targets like histone marks.

If multiple bam.files are specified, they are effectively pooled so that counting uses all reads in all files. A separate value of width can be specified for each library, to account for differences in fragmentation – see the ext argument for windowCounts for more details. In practice, this is usually unnecessary. Setting width to the average fragment length yields satisfactory results in most cases.

If invert is set, the bimodality score will be flipped around, i.e., it will be maximized when reverse-strand coverage dominates on the left, and forward-strand coverage dominates on the right. This is designed for use in CAGE analyses where this inverted bimodality is symptomatic of enhancer RNAs.

Value

A numeric vector containing the maximum bimodality score across all bases in each region.

Author(s)

Aaron Lun

Examples

bamFiles <- system.file("exdata", c("rep1.bam", "rep2.bam"), package="csaw")
incoming <- GRanges(c('chrA', 'chrA', 'chrB', 'chrC'), 
    IRanges(c(1, 500, 100, 1000), c(100, 580, 500, 1500)))

checkBimodality(bamFiles, incoming)
checkBimodality(bamFiles, incoming, width=c(100, 200))
checkBimodality(bamFiles, incoming, param=readParam(minq=20, dedup=TRUE))
checkBimodality(bamFiles, incoming, prior.count=5)

# Works on PE data; scores are computed from paired reads.
bamFile <- system.file("exdata", "pet.bam", package="csaw")
checkBimodality(bamFile, incoming[1:3], param=readParam(pe="both"))
checkBimodality(bamFile, incoming[1:3], param=readParam(pe="both", max.frag=100))

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(csaw)
Loading required package: GenomicRanges
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/csaw/checkBimodality.Rd_%03d_medium.png", width=480, height=480)
> ### Name: checkBimodality
> ### Title: Check bimodality of regions
> ### Aliases: checkBimodality
> ### Keywords: diagnostics
> 
> ### ** Examples
> 
> bamFiles <- system.file("exdata", c("rep1.bam", "rep2.bam"), package="csaw")
> incoming <- GRanges(c('chrA', 'chrA', 'chrB', 'chrC'), 
+     IRanges(c(1, 500, 100, 1000), c(100, 580, 500, 1500)))
> 
> checkBimodality(bamFiles, incoming)
[1] 0.8135593 1.0188679 1.1410256 1.3333333
> checkBimodality(bamFiles, incoming, width=c(100, 200))
[1] 0.8135593 0.8962264 1.1200000 1.1927711
> checkBimodality(bamFiles, incoming, param=readParam(minq=20, dedup=TRUE))
[1] 0.8571429 0.9512195 1.0508475 1.2045455
> checkBimodality(bamFiles, incoming, prior.count=5)
[1] 0.8225806 1.0178571 1.1358025 1.2222222
> 
> # Works on PE data; scores are computed from paired reads.
> bamFile <- system.file("exdata", "pet.bam", package="csaw")
> checkBimodality(bamFile, incoming[1:3], param=readParam(pe="both"))
[1] 2 1 1
> checkBimodality(bamFile, incoming[1:3], param=readParam(pe="both", max.frag=100))
[1] 1.666667 1.000000 1.000000
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>