Last data update: 2014.03.03

R: Profile binding sites
profileSitesR Documentation

Profile binding sites

Description

Get the coverage profile around potential binding sites.

Usage

profileSites(bam.files, regions, range=5000, ext=100, average=TRUE,
    weight=1, param=readParam(), strand=c("ignore", "use", "match"))

Arguments

bam.files

a character vector containing paths to one or more BAM files

regions

a GRanges object over which profiles are to be aggregated

range

an integer scalar specifying the range over which the profile will be collected

ext

an integer scalar specifying the average fragment length for single-end data

average

a logical scalar specifying whether the profiles should be averaged across regions

weight

a numeric vector indicating the relative weight to be assigned to the profile for each region

param

a readParam object containing read extraction parameters, or a list of such objects (one for each BAM file)

strand

a string indicating how stranded regions should be handled

Details

This function aggregates the coverage profile around the specified regions. The shape of this profile can guide an intelligent choice of the window size in windowCounts, or to determine if region expansion is necessary in regionCounts. For the former, restricting the regions to locally maximal windows with findMaxima is recommended prior to use of profileSites. The function can be also used to examine average coverage around known features of interest, like genes.

The profile records the number of fragments overlapping each base within range of the start of all regions. Single-end reads are directionally extended to ext to impute the fragment (see windowCounts for more details). For paired-end reads, the interval between each pair is used as the fragment. If multiple bam.files are specified, reads are pooled across files for counting into each profile.

Direct aggregation will favor high-abundance regions as these have higher counts. If this is undesirable, high-abundance regions can be downweighted using the weight argument. For example, this can be set to the inverse of the sum of counts across all libraries for each region in regions. This will ensure that each region contributes equally to the final profile.

Aggregation can be turned off by setting average=FALSE. In such cases, a separate profile will be returned for each region, instead of the profiles being averaged across all regions. This may be useful, e.g., for constructing heatmaps of enrichment across many regions. Note that weight has no effect when aggregation is turned off.

Value

If average=TRUE, a numeric vector of average coverages for each base position within range is returned, where the average is taken over all regions. The vector is named according to the relative position of each base to the start of the region. If weight is set as described above for each region, then the vector will represent average relative coverages, i.e., relative to the number of fragments counted in the region itself.

If average=FALSE, an integer matrix of coverage values is returned. Each row of the matrix corresponds to an entry in regions, while each column corresponds to a base position with range. Column names are set o the relative position of each base to the start of each region.

Comments on strand specificity

If strand="use", the function is called separately on the reverse-stranded regions. The profile for these regions is computed such that the left side of the profile corresponds to the upstream flank on the reverse strand (i.e., the profile is flipped). The center of the profile corresponds to the 5' end of the region on the reverse strand. This may be useful for features where strandedness is important, e.g., TSS's. Otherwise, if strand="ignore", no special treatment is given to reverse-stranded features.

By default, the strandedness of the region has no effect on read extraction. If strand="match", the profile for reverse-strand regions is made with reverse-strand reads only (this profile is also flipped, as described for strand="use"). Similarly, only forward-strand reads are used for forward- or unstranded regions. Note that param$forward must be set to NULL for this to work.

Author(s)

Aaron Lun

See Also

findMaxima, windowCounts, wwhm

Examples

bamFile <- system.file("exdata", "rep1.bam", package="csaw")
data <- windowCounts(bamFile, filter=1)
rwsms <- rowSums(assay(data))
maxed <- findMaxima(rowRanges(data), range=100, metric=rwsms)
	
x <- profileSites(bamFile, rowRanges(data)[maxed], range=200)
plot(as.integer(names(x)), x)

x <- profileSites(bamFile, rowRanges(data)[maxed], range=500)
plot(as.integer(names(x)), x)

x <- profileSites(bamFile, rowRanges(data)[maxed], range=500, weight=1/rwsms)
plot(as.integer(names(x)), x)

# Introducing some strandedness.
regs <- rowRanges(data)[maxed]
strand(regs) <- sample(c("-", "+", "*"), sum(maxed), replace=TRUE)
x <- profileSites(bamFile, regs, range=500)
plot(as.integer(names(x)), x)
x2 <- profileSites(bamFile, regs, range=500, strand="use")
points(as.integer(names(x2)), x2, col="red")
x3 <- profileSites(bamFile, regs, range=500, strand="match",
    param=readParam(forward=NULL))
points(as.integer(names(x3)), x3, col="blue")

# Returning separate profiles.
y <- profileSites(bamFile, rowRanges(data)[maxed], range=500, average=FALSE)
dim(y)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(csaw)
Loading required package: GenomicRanges
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/csaw/profileSites.Rd_%03d_medium.png", width=480, height=480)
> ### Name: profileSites
> ### Title: Profile binding sites
> ### Aliases: profileSites
> ### Keywords: diagnostics
> 
> ### ** Examples
> 
> bamFile <- system.file("exdata", "rep1.bam", package="csaw")
> data <- windowCounts(bamFile, filter=1)
> rwsms <- rowSums(assay(data))
> maxed <- findMaxima(rowRanges(data), range=100, metric=rwsms)
> 	
> x <- profileSites(bamFile, rowRanges(data)[maxed], range=200)
> plot(as.integer(names(x)), x)
> 
> x <- profileSites(bamFile, rowRanges(data)[maxed], range=500)
> plot(as.integer(names(x)), x)
> 
> x <- profileSites(bamFile, rowRanges(data)[maxed], range=500, weight=1/rwsms)
> plot(as.integer(names(x)), x)
> 
> # Introducing some strandedness.
> regs <- rowRanges(data)[maxed]
> strand(regs) <- sample(c("-", "+", "*"), sum(maxed), replace=TRUE)
> x <- profileSites(bamFile, regs, range=500)
> plot(as.integer(names(x)), x)
> x2 <- profileSites(bamFile, regs, range=500, strand="use")
> points(as.integer(names(x2)), x2, col="red")
> x3 <- profileSites(bamFile, regs, range=500, strand="match",
+     param=readParam(forward=NULL))
> points(as.integer(names(x3)), x3, col="blue")
> 
> # Returning separate profiles.
> y <- profileSites(bamFile, rowRanges(data)[maxed], range=500, average=FALSE)
> dim(y)
[1]   11 1001
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>