Last data update: 2014.03.03
R: Find the overlapped peaks among two or more set of peaks.
findOverlapsOfPeaks R Documentation
Find the overlapped peaks among two or more set of peaks.
Description
Find the overlapping peaks for two or more (less than five) set of peak ranges.
Usage
findOverlapsOfPeaks(..., maxgap=0L, minoverlap=1L,
ignore.strand=TRUE, connectedPeaks=c("min", "merge", "keepAll"))
Arguments
...
Objects of GRanges: See example below.
maxgap
Non-negative integer. Peak intervals with a separation of maxgap or less are considered to be overlapped.
minoverlap
Non-negative integer. Peak intervals with an overlapping of minoverlap or more are considered to be overlapped.
ignore.strand
When set to TRUE, the strand information is ignored in the overlap calculations.
connectedPeaks
If multiple peaks involved in overlapping in several groups, set it to "merge" will count it as 1, while set it to "min" will count it as the minimal involved peaks in any group of connected/overlapped peaks.
Details
Efficiently perform overlap queries with an interval tree implemented with GRanges.
Value
return value is An object of overlappingPeaks.
venn_cnt
an object of VennCounts
peaklist
a list consists of all overlapping peaks or unique peaks
overlappingPeaks
a list of data frame consists of the annotation of all the overlapped peaks
all.peaks
a list of GRanges object which contain the input peaks with formated rownames.
Author(s)
Jianhong Ou
References
1.Interval tree algorithm from: Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford. Introduction to Algorithms, second edition, MIT Press and McGraw-Hill. ISBN 0-262-53196-8
2.Zhu L.J. et al. (2010) ChIPpeakAnno: a Bioconductor package to annotate ChIP-seq and ChIP-chip data. BMC Bioinformatics 2010, 11:237doi:10.1186/1471-2105-11-237
3. Zhu L (2013). "Integrative analysis of ChIP-chip and ChIP-seq dataset." In Lee T and Luk ACS (eds.), Tilling Arrays, volume 1067, chapter 4, pp. -19. Humana Press. http://dx.doi.org/10.1007/978-1-62703-607-8_8, http://link.springer.com/protocol/10.1007%2F978-1-62703-607-8_8
See Also
annotatePeakInBatch, makeVennDiagram, getVennCounts, findOverlappingPeaks
Examples
peaks1 <- GRanges(seqnames=c(6,6,6,6,5),
IRanges(start=c(1543200,1557200,1563000,1569800,167889600),
end=c(1555199,1560599,1565199,1573799,167893599),
names=c("p1","p2","p3","p4","p5")),
strand="+")
peaks2 <- GRanges(seqnames=c(6,6,6,6,5),
IRanges(start=c(1549800,1554400,1565000,1569400,167888600),
end=c(1550599,1560799,1565399,1571199,167888999),
names=c("f1","f2","f3","f4","f5")),
strand="+")
t1 <- findOverlapsOfPeaks(peaks1, peaks2, maxgap=1000)
makeVennDiagram(t1)
t1$venn_cnt
t1$peaklist
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(ChIPpeakAnno)
Loading required package: grid
Loading required package: IRanges
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: Biostrings
Loading required package: XVector
Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: VennDiagram
Loading required package: futile.logger
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/ChIPpeakAnno/findOverlapsOfPeaks.Rd_%03d_medium.png", width=480, height=480)
> ### Name: findOverlapsOfPeaks
> ### Title: Find the overlapped peaks among two or more set of peaks.
> ### Aliases: findOverlapsOfPeaks overlappingPeaks overlappingPeaks-class
> ### Keywords: misc
>
> ### ** Examples
>
> peaks1 <- GRanges(seqnames=c(6,6,6,6,5),
+ IRanges(start=c(1543200,1557200,1563000,1569800,167889600),
+ end=c(1555199,1560599,1565199,1573799,167893599),
+ names=c("p1","p2","p3","p4","p5")),
+ strand="+")
> peaks2 <- GRanges(seqnames=c(6,6,6,6,5),
+ IRanges(start=c(1549800,1554400,1565000,1569400,167888600),
+ end=c(1550599,1560799,1565399,1571199,167888999),
+ names=c("f1","f2","f3","f4","f5")),
+ strand="+")
> t1 <- findOverlapsOfPeaks(peaks1, peaks2, maxgap=1000)
> makeVennDiagram(t1)
Missing totalTest! totalTest is required for HyperG test.
If totalTest is missing, pvalue will be calculated by estimating
the total binding sites of encoding region of human.
totalTest = humanGenomeSize * (2%(codingDNA) +
1%(regulationRegion)) / ( 2 * averagePeakWidth )
= 3.3e+9 * 0.03 / ( 2 * averagePeakWidth)
= 5e+7 /averagePeakWidth
$p.value
peaks1 peaks2 pval
[1,] 1 1 8.816366e-19
$vennCounts
peaks1 peaks2 Counts
[1,] 0 0 0
[2,] 0 1 0
[3,] 1 0 0
[4,] 1 1 5
attr(,"class")
[1] "VennCounts"
> t1$venn_cnt
peaks1 peaks2 Counts
[1,] 0 0 0
[2,] 0 1 0
[3,] 1 0 0
[4,] 1 1 5
attr(,"class")
[1] "VennCounts"
> t1$peaklist
$`peaks1///peaks2`
GRanges object with 4 ranges and 1 metadata column:
seqnames ranges strand |
<Rle> <IRanges> <Rle> |
[1] 6 [ 1543200, 1560799] + |
[2] 6 [ 1563000, 1565399] + |
[3] 6 [ 1569400, 1573799] + |
[4] 5 [167888600, 167893599] + |
peakNames
<CharacterList>
[1] peaks1__p1,peaks2__f1,peaks2__f2,...
[2] peaks1__p3,peaks2__f3
[3] peaks2__f4,peaks1__p4
[4] peaks2__f5,peaks1__p5
-------
seqinfo: 2 sequences from an unspecified genome; no seqlengths
>
>
>
>
>
> dev.off()
null device
1
>