Last data update: 2014.03.03

R: Consolidate window sizes
consolidateSizesR Documentation

Consolidate window sizes

Description

Consolidate DB results from multiple window sizes.

Usage

consolidateSizes(data.list, result.list, equiweight=TRUE, merge.args=list(),
    combine.args=list(), region=NULL, overlap.args=list()) 

Arguments

data.list

a list of RangedSummarizedExperiment and/or GRanges objects

result.list

a list of data frames containing the DB test results for each entry of data.list

equiweight

a logical scalar indicating whether equal weighting from each window size should be enforced

merge.args

a named list of parameters to pass to mergeWindows

combine.args

a named list of parameters to pass to combineTests

region

a GRanges object specifying regions of interest for overlapping with windows

overlap.args

a named list of parameters to pass to findOverlaps

Details

This function consolidates DB results from multiple window sizes, to provide comprehensive detection of DB at a range of spatial resolutions. RangedSummarizedExperiment objects can be generated by running windowCounts at a range of window sizes. Windows of all sizes are clustered together through mergeWindows, and the p-values from all windows in each cluster are combined using combineTests. If merge.args$tol is not specified, it is automatically set to 100 bp and a warning is raised.

Some effort is required to equalize the contribution of each window size to the combined p-value of each cluster. This is done by setting equiweight=TRUE, where the weight of each window is inversely proportional to the number of windows of that size. Otherwise, the combined p-value would be determined by numerous small windows in each cluster.

If region is specified, each entry of region is defined as a cluster. Windows in each cluster are identified using findOverlaps, and consolidation is performed across multiple window sizes like before. Note that the returned id will be a list of Hits objects rather than integer vectors, as one window (subject) may overlap more than one region (query).

Value

A named list is returned containing:

id

a list of integer vectors, where each vector corresponds to an object in data.list; the entries of the vector specify the cluster to which each row of that object is assigned

region

a GRanges object containing the genomic coordinates of the clusters of merged windows (or other regions, if region is specified)

table

a data frame containing the combined DB results for each region

Author(s)

Aaron Lun

See Also

windowCounts, mergeWindows, findOverlaps, combineTests

Examples

bamFiles <- system.file("exdata", c("rep1.bam", "rep2.bam"), package="csaw")
low <- windowCounts(bamFiles, width=1, filter=1)
med <- windowCounts(bamFiles, width=100, filter=1)
high <- windowCounts(bamFiles, width=500, filter=1)

# Making up some DB results.
dbl <- data.frame(logFC=rnorm(nrow(low)), PValue=runif(nrow(low)), logCPM=0)
dbm <- data.frame(logFC=rnorm(nrow(med)), PValue=runif(nrow(med)), logCPM=0)
dbh <- data.frame(logFC=rnorm(nrow(high)), PValue=runif(nrow(high)), logCPM=0)

# Consolidating.
cons <- consolidateSizes(list(low, med, high), list(dbl, dbm, dbh),
	merge.args=list(tol=100, max.width=300))
cons$region
cons$id
cons$table

# Without weights.
cons <- consolidateSizes(list(low, med, high), list(dbl, dbm, dbh),
	merge.args=list(tol=100, max.width=300), equiweight=FALSE)
cons$table

# Trying with a custom region.
of.interest <- GRanges(c('chrA', 'chrA', 'chrB', 'chrC'), 
    IRanges(c(1, 500, 100, 1000), c(200, 1000, 700, 1500)))
cons <- consolidateSizes(list(low, med, high), list(dbl, dbm, dbh),
    region=of.interest)
cons$table
cons$id

# Trying with limited numbers of overlaps; empty regions are ignored.
cons <- consolidateSizes(list(low[1,], med[1,], high[1,]), 
    list(dbl[1,], dbm[1,], dbh[1,]), region=of.interest)
cons$region
cons$table

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(csaw)
Loading required package: GenomicRanges
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/csaw/consolidateSizes.Rd_%03d_medium.png", width=480, height=480)
> ### Name: consolidateSizes
> ### Title: Consolidate window sizes
> ### Aliases: consolidateSizes
> ### Keywords: clustering
> 
> ### ** Examples
> 
> bamFiles <- system.file("exdata", c("rep1.bam", "rep2.bam"), package="csaw")
> low <- windowCounts(bamFiles, width=1, filter=1)
> med <- windowCounts(bamFiles, width=100, filter=1)
> high <- windowCounts(bamFiles, width=500, filter=1)
> 
> # Making up some DB results.
> dbl <- data.frame(logFC=rnorm(nrow(low)), PValue=runif(nrow(low)), logCPM=0)
> dbm <- data.frame(logFC=rnorm(nrow(med)), PValue=runif(nrow(med)), logCPM=0)
> dbh <- data.frame(logFC=rnorm(nrow(high)), PValue=runif(nrow(high)), logCPM=0)
> 
> # Consolidating.
> cons <- consolidateSizes(list(low, med, high), list(dbl, dbm, dbh),
+ 	merge.args=list(tol=100, max.width=300))
> cons$region
GRanges object with 13 ranges and 0 metadata columns:
       seqnames      ranges strand
          <Rle>   <IRanges>  <Rle>
   [1]     chrA [  1,  500]      *
   [2]     chrA [ 51,  750]      *
   [3]     chrA [301, 1000]      *
   [4]     chrA [551, 1250]      *
   [5]     chrA [801, 1298]      *
   ...      ...         ...    ...
   [9]     chrC [  1,  500]      *
  [10]     chrC [ 51,  750]      *
  [11]     chrC [301, 1050]      *
  [12]     chrC [601, 1300]      *
  [13]     chrC [851, 1345]      *
  -------
  seqinfo: 3 sequences from an unspecified genome
> cons$id
[[1]]
 [1]  1  1  1  1  1  1  2  2  2  2  2  3  3  3  3  3  4  4  4  4  4  5  5  5  5
[26]  5  6  6  6  6  6  6  7  7  7  7  7  7  8  8  8  8  8  8  9  9  9  9  9  9
[51] 10 10 10 10 10 11 11 11 11 11 11 12 12 12 12 12 13 13 13 13 13

[[2]]
 [1]  1  1  1  1  1  2  2  2  2  2  3  3  3  3  3  4  4  4  4  4  5  5  5  5  5
[26]  5  6  6  6  6  6  7  7  7  7  7  7  8  8  8  8  8  8  8  9  9  9  9  9 10
[51] 10 10 10 10 11 11 11 11 11 11 12 12 12 12 12 13 13 13 13 13 13

[[3]]
 [1]  1  2  2  2  2  2  3  3  3  3  3  4  4  4  4  4  5  5  5  5  5  5  5  5  5
[26]  5  6  7  7  7  7  7  7  8  8  8  8  8  8  8  8  8  8  8  9 10 10 10 10 10
[51] 11 11 11 11 11 11 12 12 12 12 12 13 13 13 13 13 13 13 13 13 13

> cons$table
   nWindows logFC.up logFC.down     PValue       FDR
1        12        6          3 0.77111366 0.9916068
2        15        2          5 0.76424131 0.9916068
3        15        5          6 0.06403410 0.3756885
4        15        5          1 0.31067398 0.7484310
5        21        5          4 0.84386358 0.9916068
6        12        5          2 0.93312088 0.9916068
7        18        6          7 0.23667017 0.7484310
8        24        7          6 0.99160676 0.9916068
9        12        2          6 0.34542971 0.7484310
10       15        5          4 0.40689564 0.7556633
11       18        4          8 0.07745643 0.3756885
12       15        5          3 0.08669734 0.3756885
13       21        6          3 0.55709241 0.9052752
> 
> # Without weights.
> cons <- consolidateSizes(list(low, med, high), list(dbl, dbm, dbh),
+ 	merge.args=list(tol=100, max.width=300), equiweight=FALSE)
> cons$table
   nWindows logFC.up logFC.down     PValue       FDR
1        12        6          3 0.83923902 0.9091756
2        15        2          5 0.76424131 0.9031943
3        15        5          6 0.06403410 0.3756885
4        15        5          1 0.31067398 0.8077523
5        21        5          4 0.72197217 0.9031943
6        12        5          2 0.66786971 0.9031943
7        18        6          7 0.23667017 0.7691781
8        24        7          6 0.99160676 0.9916068
9        12        2          6 0.48820732 0.8651670
10       15        5          4 0.40689564 0.8651670
11       18        4          8 0.07745643 0.3756885
12       15        5          3 0.08669734 0.3756885
13       21        6          3 0.53241046 0.8651670
> 
> # Trying with a custom region.
> of.interest <- GRanges(c('chrA', 'chrA', 'chrB', 'chrC'), 
+     IRanges(c(1, 500, 100, 1000), c(200, 1000, 700, 1500)))
> cons <- consolidateSizes(list(low, med, high), list(dbl, dbm, dbh),
+     region=of.interest)
> cons$table
  nWindows logFC.up logFC.down    PValue       FDR
1       12        5          3 0.8392390 0.8392390
2       42       12         12 0.1280682 0.3121104
3       40       17         10 0.5522304 0.7363072
4       33       11          6 0.1560552 0.3121104
> cons$id
[[1]]
Hits object with 33 hits and 0 metadata columns:
       queryHits subjectHits
       <integer>   <integer>
   [1]         1           1
   [2]         1           2
   [3]         1           3
   [4]         1           4
   [5]         2          11
   ...       ...         ...
  [29]         4          67
  [30]         4          68
  [31]         4          69
  [32]         4          70
  [33]         4          71
  -------
  queryLength: 4 / subjectLength: 71

[[2]]
Hits object with 39 hits and 0 metadata columns:
       queryHits subjectHits
       <integer>   <integer>
   [1]         1           1
   [2]         1           2
   [3]         1           3
   [4]         1           4
   [5]         2           9
   ...       ...         ...
  [35]         4          67
  [36]         4          68
  [37]         4          69
  [38]         4          70
  [39]         4          71
  -------
  queryLength: 4 / subjectLength: 71

[[3]]
Hits object with 55 hits and 0 metadata columns:
       queryHits subjectHits
       <integer>   <integer>
   [1]         1           1
   [2]         1           2
   [3]         1           3
   [4]         1           4
   [5]         2           1
   ...       ...         ...
  [51]         4          67
  [52]         4          68
  [53]         4          69
  [54]         4          70
  [55]         4          71
  -------
  queryLength: 4 / subjectLength: 71

> 
> # Trying with limited numbers of overlaps; empty regions are ignored.
> cons <- consolidateSizes(list(low[1,], med[1,], high[1,]), 
+     list(dbl[1,], dbm[1,], dbh[1,]), region=of.interest)
> cons$region
GRanges object with 4 ranges and 0 metadata columns:
      seqnames       ranges strand
         <Rle>    <IRanges>  <Rle>
  [1]     chrA [   1,  200]      *
  [2]     chrA [ 500, 1000]      *
  [3]     chrB [ 100,  700]      *
  [4]     chrC [1000, 1500]      *
  -------
  seqinfo: 3 sequences from an unspecified genome; no seqlengths
> cons$table
  nWindows logFC.up logFC.down    PValue       FDR
1        3        2          1 0.6554466 0.6554466
2        1        1          0 0.4369644 0.6554466
3       NA       NA         NA        NA        NA
4       NA       NA         NA        NA        NA
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>