a character vector containing paths to sorted and indexed BAM files
max.dist
integer scalar specifying the maximum delay distance over which correlation coefficients will be calculated
cross
a logical scalar specifying whether cross-correlations should be computed
param
a readParam object containing read extraction parameters, or a list of such objects (one for each BAM file)
Details
If cross=TRUE, reads are separated into those mapping on the forward and reverse strands.
Positions on the forward strand are shifted forward by a delay interval.
The chromosome-wide correlation coefficient between the shifted forward positions and the original reverse positions are computed.
This is repeated for all delay intervals less than max.dist.
A weighted mean for the cross-correlation is taken across all chromosomes, with weighting based on the number of reads.
Cross-correlation plots can be used to check the quality of immunoprecipitation for ChIP-Seq experiments involving transcription factors or punctate histone marks.
Strong immunoprecipitation should result in a peak at a delay corresponding to the fragment length.
A spike may also be observed at the delay corresponding to the read length.
This is probably an artefact of the mapping process where unique mapping occurs to the same sequence on each strand.
By default, marked duplicate reads are removed from each BAM file prior to calculation of coefficients.
This is strongly recommended, even if the rest of the analysis will be performed with duplicates retained.
Otherwise, the read length spike will dominate the plot, such that the fragment length peak will no longer be easily visible.
If cross=FALSE, auto-correlation coefficients are computed without use of strand information.
This is designed to guide estimation of the average width of enrichment for diffuse histone marks.
For example, the width can be defined as the delay distance at which the autocorrelations become negligble.
However, this tends to be ineffective in practice as diffuse marks tend to have very weak correlations to begin with.
If multiple BAM files are specified in bam.files, the reads from all libraries are pooled prior to calculation of the correlation coefficients.
This is convenient for determining the average correlation profile across an entire dataset.
Separate calculations for each file will require multiple calls to correlateReads.
Paired-end data is also supported, whereby correlations are computed using only those reads in proper pairs.
This may be less meaningful as the presence of proper pairs will inevitably result in a strong peak at the fragment length.
Instead, IP efficiency can be diagnosed by treating paired-end data as single-end, e.g., with pe="first" in readParam.
Value
A numeric vector of length max.dist+1 containing the correlation coefficients for each delay interval from 0 to max.dist.
Author(s)
Aaron Lun
References
Kharchenko PV, Tolstorukov MY and Park, PJ (2008). Design and analysis of
ChIP-seq experiments for DNA-binding proteins. Nat. Biotechnol. 26,
1351-1359.
See Also
ccf
Examples
n <- 20
bamFile <- system.file("exdata", "rep1.bam", package="csaw")
par(mfrow=c(2,2))
x <- correlateReads(bamFile, max.dist=n)
plot(0:n, x, xlab="delay (bp)", ylab="ccf")
x <- correlateReads(bamFile, max.dist=n, param=readParam(dedup=TRUE))
plot(0:n, x, xlab="delay (bp)", ylab="ccf")
x <- correlateReads(bamFile, max.dist=n, cross=FALSE)
plot(0:n, x, xlab="delay (bp)", ylab="acf")
# Also works on paired-end data.
bamFile <- system.file("exdata", "pet.bam", package="csaw")
x <- correlateReads(bamFile, param=readParam(pe="both"))
head(x)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(csaw)
Loading required package: GenomicRanges
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/csaw/correlateReads.Rd_%03d_medium.png", width=480, height=480)
> ### Name: correlateReads
> ### Title: Compute correlation coefficients between reads
> ### Aliases: correlateReads
> ### Keywords: diagnostics
>
> ### ** Examples
>
> n <- 20
> bamFile <- system.file("exdata", "rep1.bam", package="csaw")
> par(mfrow=c(2,2))
>
> x <- correlateReads(bamFile, max.dist=n)
> plot(0:n, x, xlab="delay (bp)", ylab="ccf")
>
> x <- correlateReads(bamFile, max.dist=n, param=readParam(dedup=TRUE))
> plot(0:n, x, xlab="delay (bp)", ylab="ccf")
>
> x <- correlateReads(bamFile, max.dist=n, cross=FALSE)
> plot(0:n, x, xlab="delay (bp)", ylab="acf")
>
> # Also works on paired-end data.
> bamFile <- system.file("exdata", "pet.bam", package="csaw")
> x <- correlateReads(bamFile, param=readParam(pe="both"))
> head(x)
[1] -0.02256113 -0.02274439 0.12439047 -0.02057973 -0.02074556 -0.02091439
>
>
>
>
>
> dev.off()
null device
1
>