Last data update: 2014.03.03

R: Compute binding site overlaps (occupancy analysis)
dba.overlapR Documentation

Compute binding site overlaps (occupancy analysis)

Description

Computes binding overlaps and co-occupancy statistics

Usage

dba.overlap(DBA, mask, mode=DBA_OLAP_PEAKS, 
            contrast, method=DBA$config$AnalysisMethod, th=DBA$config$th, 
            bUsePval=DBA$config$bUsePval, 
            report, byAttribute, bCorOnly=TRUE, CorMethod="pearson", 
            DataType=DBA$config$DataType)

Arguments

DBA

DBA object

mask

mask or vector of peakset numbers indicating a subset of peaksets to use (see dba.mask). When generating overlapping/unique peaksets, either two, three, or four peaksets may be specified. If the mode type is DBA_OLAP_ALL, and a contrast is specified, a value of TRUE (mask=TRUE) indicates that all samples should be included (otherwise only those present in one of the contrast groups will be included).

mode

indicates which results should be returned (see MODES below). One of:

  • DBA_OLAP_PEAKS

  • DBA_OLAP_ALL

  • DBA_OLAP_RATE

contrast

contrast number to use. Only specified if contrast data is to be used when mode=DBA_OLAP_ALL. See dba.show(DBA, bContrast=T) to get contrast numbers.

method

if contrast is specified and mode=DBA_OLAP_ALL, use data from method used for analysis:

  • DBA_DESEQ2

  • DBA_DESEQ2_BLOCK

  • DBA_EDGER

  • DBA_EDGER_BLOCK

th

if contrast is specified and mode=DBA_OLAP_ALL, significance threshold; all sites with FDR (or p-values, see bUsePval) less than or equal to this value will be included. A value of 1 will include all binding sites, but only the samples included in the contrast.

bUsePval

if contrast is specified and mode=DBA_OLAP_ALL, logical indicating whether to use FDR (FALSE) or p-value (TRUE) for thresholding.

report

if contrast is specified and mode=DBA_OLAP_ALL, a report (obtained from dba.report) specifying the data to be used. If counts are included in the report (and a contrast is specified), the count data from the report will be used to compute correlations, rather than the scores in the global binding affinity matrix. If report is present, the method, th, and bUsePval parameters are ignored.

byAttribute

when computing co-occupancy statistics (DBA_OLAP_ALL), limit comparisons to peaksets with the same value for a specific attribute, one of:

  • DBA_ID

  • DBA_TISSUE

  • DBA_FACTOR

  • DBA_CONDITION

  • DBA_TREATMENT

  • DBA_REPLICATE

  • DBA_CONSENSUS

  • DBA_CALLER

bCorOnly

when computing co-occupancy statistics (DBA_OLAP_ALL), logical indicating that only correlations, and not overlaps, should be computed. This is much faster if only correlations are desired (e.g. to plot the correlations using dba.plotHeatmap).

CorMethod

when computing co-occupancy statistics (DBA_OLAP_ALL), method to use when computing correlations.

DataType

if mode==DBA_OLAP_PEAKS, the class of object that peaksets should be returned as:

  • DBA_DATA_GRANGES

  • DBA_DATA_RANGEDDATA

  • DBA_DATA_FRAME

Can be set as default behavior by setting DBA$config$DataType.

Details

MODE: Generate overlapping/unique peaksets:

dba.overlap(DBA, mask, mode=DBA_OLAP_PEAKS, minVal)

MODE: Compute correlation and co-occupancy statistics (e.g. for dba.plotHeatmap):

dba.overlap(DBA, mask, mode=DBA_OLAP_ALL, byAttribute, minVal, attributes, bCorOnly, CorMethod)

MODE: Compute correlation and co-occupancy statistics using significantly differentially bound sites (e.g. for dba.plotHeatmap):

dba.overlap(DBA, mask, mode=DBA_OLAP_ALL, byAttribute, minVal, contrast, method, th=, bUsePval, attributes, bCorOnly, CorMethod)

Note that the scores from the global binding affinity matrix will be used for correlations unless a report containing count data is specified.

MODE: Compute overlap rates at different stringency thresholds:

dba.overlap(DBA, mask, mode=DBA_OLAP_RATE, minVal)

Value

Value depends on the mode specified in the mode parameter.

If mode = DBA_OLAP_PEAKS, Value is an overlap record: a list of three peaksets for an A-B overlap, seven peaksets for a A-B-C overlap, and fifteen peaksets for a A-B-C-D overlap:

inAll

peaks in all peaksets

onlyA

peaks unique to peakset A

onlyB

peaks unique to peakset B

onlyC

peaks unique to peakset C

onlyD

peaks unique to peakset D

notA

peaks in all peaksets except peakset A

notB

peaks in all peaksets except peakset B

notC

peaks in all peaksets except peakset C

notD

peaks in all peaksets except peakset D

AandB

peaks in peaksets A and B but not in peaksets C or D

AandC

peaks in peaksets A and C but not in peaksets B or D

AandD

peaks in peaksets A and D but not in peaksets B or C

BandC

peaks in peaksets B and C but not in peaksets A or D

BandD

peaks in peaksets B and D but not in peaksets A or C

CandD

peaks in peaksets C and D but not in peaksets A or B

If mode = DBA_OLAP_ALL, Value is a correlation record: a matrix with a row for each pair of peaksets and the following columns:

A

peakset number of first peakset in overlap

B

peakset number of second peakset in overlap

onlyA

number of sites unique to peakset A

onlyB

number of sites unique to peakset B

inAll

number of peaks in both peakset A and B (merged)

R2

correlation value A vs B

Overlap

percentage overlap (number of overlapping sites divided by number of peaks unique to smaller peakset

If mode = DBA_OLAP_RATE, Value is a vector whose length is the number of peaksets, containing the number of overlapping peaks at the corresponding minOverlaps threshold (i.e., Value[1] is the total number of unique sites, Value[2] is the number of unique sites appearing in at least two peaksets, Value[3] the number of sites overlapping in at least three peaksets, etc.).

Author(s)

Rory Stark

See Also

dba.plotVenn, dba.plotHeatmap

Examples

data(tamoxifen_peaks)
# default mode: DBA_OLAP_PEAKS -- get overlapping/non overlapping peaksets
mcf7 <- dba.overlap(tamoxifen,tamoxifen$masks$MCF7&tamoxifen$masks$Responsive)
names(mcf7)
mcf7$inAll

# mode:  DBA_OLAP_ALL -- get correlation record
mcf7 <- dba(tamoxifen,tamoxifen$masks$MCF7)
mcf7.corRec <- dba.overlap(mcf7,mode=DBA_OLAP_ALL,bCorOnly=FALSE)
mcf7.corRec

# mode: DBA_OLAP_RATE -- get overlap rate vector
data(tamoxifen_peaks)
rate <- dba.overlap(tamoxifen, mode=DBA_OLAP_RATE)
rate
plot(rate,type='b',xlab="# peaksets",ylab="# common peaks",
     main="Tamoxifen dataset overlap rate")

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(DiffBind)
Loading required package: GenomicRanges
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.


> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/DiffBind/dba.overlap.Rd_%03d_medium.png", width=480, height=480)
> ### Name: dba.overlap
> ### Title: Compute binding site overlaps (occupancy analysis)
> ### Aliases: dba.overlap
> 
> ### ** Examples
> 
> data(tamoxifen_peaks)
> # default mode: DBA_OLAP_PEAKS -- get overlapping/non overlapping peaksets
> mcf7 <- dba.overlap(tamoxifen,tamoxifen$masks$MCF7&tamoxifen$masks$Responsive)
> names(mcf7)
[1] "onlyA" "onlyB" "onlyC" "notA"  "notB"  "notC"  "inAll"
> mcf7$inAll
GRanges object with 885 ranges and 3 metadata columns:
       seqnames               ranges strand |             scoreA
          <Rle>            <IRanges>  <Rle> |          <numeric>
     1    chr18     [111564, 112005]      * | 0.0573241900280181
     6    chr18     [346464, 347342]      * |  0.108360551694246
    10    chr18     [399098, 399919]      * |  0.288184393430442
    12    chr18     [499098, 500193]      * |  0.959737465364313
    13    chr18     [503660, 504550]      * |  0.236083032770391
   ...      ...                  ...    ... .                ...
  1767    chr18 [77438923, 77440019]      * |  0.607061810188697
  1769    chr18 [77541035, 77541645]      * | 0.0949923375799136
  1774    chr18 [77694196, 77695138]      * |  0.506886890295816
  1779    chr18 [77968125, 77968822]      * |  0.155703472082476
  1780    chr18 [77987486, 77988208]      * |  0.197907153140044
                   scoreB             scoreC
                <numeric>          <numeric>
     1 0.0268569057628165 0.0554273660247567
     6 0.0279247492015297 0.0405869362152264
    10  0.155123557848488   0.22176513890865
    12  0.496817376257209  0.802688474695799
    13  0.101589864493241   0.20406054944851
   ...                ...                ...
  1767  0.064588446062649   0.28330126876705
  1769 0.0292466879163222 0.0593926496879118
  1774  0.107488734573361  0.201369600316728
  1779 0.0853438488036615  0.102488045381156
  1780 0.0667144837523761 0.0820399250245897
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> 
> # mode:  DBA_OLAP_ALL -- get correlation record
> mcf7 <- dba(tamoxifen,tamoxifen$masks$MCF7)
> mcf7.corRec <- dba.overlap(mcf7,mode=DBA_OLAP_ALL,bCorOnly=FALSE)
> mcf7.corRec
      A B onlyA onlyB inAll       Cor   Overlap
 [1,] 1 3   397   214  1107 0.9520833 0.6443539
 [2,] 2 3    90   384   937 0.9445168 0.6640680
 [3,] 4 5   538    43   876 0.9189077 0.6012354
 [4,] 1 2   580   103   924 0.9032314 0.5749844
 [5,] 2 5   436   328   591 0.7470060 0.4361624
 [6,] 3 5   640   238   681 0.7269301 0.4368185
 [7,] 3 4   446   539   875 0.7066950 0.4704301
 [8,] 1 5   807   222   697 0.7023637 0.4038239
 [9,] 1 4   601   511   903 0.6971568 0.4481390
[10,] 2 4   297   684   730 0.6864931 0.4266511
> 
> # mode: DBA_OLAP_RATE -- get overlap rate vector
> data(tamoxifen_peaks)
> rate <- dba.overlap(tamoxifen, mode=DBA_OLAP_RATE)
> rate
 [1] 3795 2845 1773 1388 1074  817  653  484  384  202  129
> plot(rate,type='b',xlab="# peaksets",ylab="# common peaks",
+      main="Tamoxifen dataset overlap rate")
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>