Last data update: 2014.03.03
R: Consolidate window sizes
consolidateSizes R Documentation
Consolidate window sizes
Description
Consolidate DB results from multiple window sizes.
Usage
consolidateSizes(data.list, result.list, equiweight=TRUE, merge.args=list(),
combine.args=list(), region=NULL, overlap.args=list())
Arguments
data.list
a list of RangedSummarizedExperiment and/or GRanges objects
result.list
a list of data frames containing the DB test results for each entry of data.list
equiweight
a logical scalar indicating whether equal weighting from each window size should be enforced
merge.args
a named list of parameters to pass to mergeWindows
combine.args
a named list of parameters to pass to combineTests
region
a GRanges
object specifying regions of interest for overlapping with windows
overlap.args
a named list of parameters to pass to findOverlaps
Details
This function consolidates DB results from multiple window sizes, to provide comprehensive detection of DB at a range of spatial resolutions.
RangedSummarizedExperiment objects can be generated by running windowCounts
at a range of window sizes.
Windows of all sizes are clustered together through mergeWindows
, and the p-values from all windows in each cluster are combined using combineTests
.
If merge.args$tol
is not specified, it is automatically set to 100 bp and a warning is raised.
Some effort is required to equalize the contribution of each window size to the combined p-value of each cluster.
This is done by setting equiweight=TRUE
, where the weight of each window is inversely proportional to the number of windows of that size.
Otherwise, the combined p-value would be determined by numerous small windows in each cluster.
If region
is specified, each entry of region
is defined as a cluster.
Windows in each cluster are identified using findOverlaps
, and consolidation is performed across multiple window sizes like before.
Note that the returned id
will be a list of Hits
objects rather than integer vectors, as one window (subject) may overlap more than one region (query).
Value
A named list is returned containing:
id
a list of integer vectors, where each vector corresponds to an object in data.list
; the entries of the vector specify the cluster to which each row of that object is assigned
region
a GRanges
object containing the genomic coordinates of the clusters of merged windows (or other regions, if region
is specified)
table
a data frame containing the combined DB results for each region
Author(s)
Aaron Lun
See Also
windowCounts
,
mergeWindows
,
findOverlaps
,
combineTests
Examples
bamFiles <- system.file("exdata", c("rep1.bam", "rep2.bam"), package="csaw")
low <- windowCounts(bamFiles, width=1, filter=1)
med <- windowCounts(bamFiles, width=100, filter=1)
high <- windowCounts(bamFiles, width=500, filter=1)
# Making up some DB results.
dbl <- data.frame(logFC=rnorm(nrow(low)), PValue=runif(nrow(low)), logCPM=0)
dbm <- data.frame(logFC=rnorm(nrow(med)), PValue=runif(nrow(med)), logCPM=0)
dbh <- data.frame(logFC=rnorm(nrow(high)), PValue=runif(nrow(high)), logCPM=0)
# Consolidating.
cons <- consolidateSizes(list(low, med, high), list(dbl, dbm, dbh),
merge.args=list(tol=100, max.width=300))
cons$region
cons$id
cons$table
# Without weights.
cons <- consolidateSizes(list(low, med, high), list(dbl, dbm, dbh),
merge.args=list(tol=100, max.width=300), equiweight=FALSE)
cons$table
# Trying with a custom region.
of.interest <- GRanges(c('chrA', 'chrA', 'chrB', 'chrC'),
IRanges(c(1, 500, 100, 1000), c(200, 1000, 700, 1500)))
cons <- consolidateSizes(list(low, med, high), list(dbl, dbm, dbh),
region=of.interest)
cons$table
cons$id
# Trying with limited numbers of overlaps; empty regions are ignored.
cons <- consolidateSizes(list(low[1,], med[1,], high[1,]),
list(dbl[1,], dbm[1,], dbh[1,]), region=of.interest)
cons$region
cons$table
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(csaw)
Loading required package: GenomicRanges
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/csaw/consolidateSizes.Rd_%03d_medium.png", width=480, height=480)
> ### Name: consolidateSizes
> ### Title: Consolidate window sizes
> ### Aliases: consolidateSizes
> ### Keywords: clustering
>
> ### ** Examples
>
> bamFiles <- system.file("exdata", c("rep1.bam", "rep2.bam"), package="csaw")
> low <- windowCounts(bamFiles, width=1, filter=1)
> med <- windowCounts(bamFiles, width=100, filter=1)
> high <- windowCounts(bamFiles, width=500, filter=1)
>
> # Making up some DB results.
> dbl <- data.frame(logFC=rnorm(nrow(low)), PValue=runif(nrow(low)), logCPM=0)
> dbm <- data.frame(logFC=rnorm(nrow(med)), PValue=runif(nrow(med)), logCPM=0)
> dbh <- data.frame(logFC=rnorm(nrow(high)), PValue=runif(nrow(high)), logCPM=0)
>
> # Consolidating.
> cons <- consolidateSizes(list(low, med, high), list(dbl, dbm, dbh),
+ merge.args=list(tol=100, max.width=300))
> cons$region
GRanges object with 13 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chrA [ 1, 500] *
[2] chrA [ 51, 750] *
[3] chrA [301, 1000] *
[4] chrA [551, 1250] *
[5] chrA [801, 1298] *
... ... ... ...
[9] chrC [ 1, 500] *
[10] chrC [ 51, 750] *
[11] chrC [301, 1050] *
[12] chrC [601, 1300] *
[13] chrC [851, 1345] *
-------
seqinfo: 3 sequences from an unspecified genome
> cons$id
[[1]]
[1] 1 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 5 5 5
[26] 5 6 6 6 6 6 6 7 7 7 7 7 7 8 8 8 8 8 8 9 9 9 9 9 9
[51] 10 10 10 10 10 11 11 11 11 11 11 12 12 12 12 12 13 13 13 13 13
[[2]]
[1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5
[26] 5 6 6 6 6 6 7 7 7 7 7 7 8 8 8 8 8 8 8 9 9 9 9 9 10
[51] 10 10 10 10 11 11 11 11 11 11 12 12 12 12 12 13 13 13 13 13 13
[[3]]
[1] 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4 5 5 5 5 5 5 5 5 5
[26] 5 6 7 7 7 7 7 7 8 8 8 8 8 8 8 8 8 8 8 9 10 10 10 10 10
[51] 11 11 11 11 11 11 12 12 12 12 12 13 13 13 13 13 13 13 13 13 13
> cons$table
nWindows logFC.up logFC.down PValue FDR
1 12 6 3 0.77111366 0.9916068
2 15 2 5 0.76424131 0.9916068
3 15 5 6 0.06403410 0.3756885
4 15 5 1 0.31067398 0.7484310
5 21 5 4 0.84386358 0.9916068
6 12 5 2 0.93312088 0.9916068
7 18 6 7 0.23667017 0.7484310
8 24 7 6 0.99160676 0.9916068
9 12 2 6 0.34542971 0.7484310
10 15 5 4 0.40689564 0.7556633
11 18 4 8 0.07745643 0.3756885
12 15 5 3 0.08669734 0.3756885
13 21 6 3 0.55709241 0.9052752
>
> # Without weights.
> cons <- consolidateSizes(list(low, med, high), list(dbl, dbm, dbh),
+ merge.args=list(tol=100, max.width=300), equiweight=FALSE)
> cons$table
nWindows logFC.up logFC.down PValue FDR
1 12 6 3 0.83923902 0.9091756
2 15 2 5 0.76424131 0.9031943
3 15 5 6 0.06403410 0.3756885
4 15 5 1 0.31067398 0.8077523
5 21 5 4 0.72197217 0.9031943
6 12 5 2 0.66786971 0.9031943
7 18 6 7 0.23667017 0.7691781
8 24 7 6 0.99160676 0.9916068
9 12 2 6 0.48820732 0.8651670
10 15 5 4 0.40689564 0.8651670
11 18 4 8 0.07745643 0.3756885
12 15 5 3 0.08669734 0.3756885
13 21 6 3 0.53241046 0.8651670
>
> # Trying with a custom region.
> of.interest <- GRanges(c('chrA', 'chrA', 'chrB', 'chrC'),
+ IRanges(c(1, 500, 100, 1000), c(200, 1000, 700, 1500)))
> cons <- consolidateSizes(list(low, med, high), list(dbl, dbm, dbh),
+ region=of.interest)
> cons$table
nWindows logFC.up logFC.down PValue FDR
1 12 5 3 0.8392390 0.8392390
2 42 12 12 0.1280682 0.3121104
3 40 17 10 0.5522304 0.7363072
4 33 11 6 0.1560552 0.3121104
> cons$id
[[1]]
Hits object with 33 hits and 0 metadata columns:
queryHits subjectHits
<integer> <integer>
[1] 1 1
[2] 1 2
[3] 1 3
[4] 1 4
[5] 2 11
... ... ...
[29] 4 67
[30] 4 68
[31] 4 69
[32] 4 70
[33] 4 71
-------
queryLength: 4 / subjectLength: 71
[[2]]
Hits object with 39 hits and 0 metadata columns:
queryHits subjectHits
<integer> <integer>
[1] 1 1
[2] 1 2
[3] 1 3
[4] 1 4
[5] 2 9
... ... ...
[35] 4 67
[36] 4 68
[37] 4 69
[38] 4 70
[39] 4 71
-------
queryLength: 4 / subjectLength: 71
[[3]]
Hits object with 55 hits and 0 metadata columns:
queryHits subjectHits
<integer> <integer>
[1] 1 1
[2] 1 2
[3] 1 3
[4] 1 4
[5] 2 1
... ... ...
[51] 4 67
[52] 4 68
[53] 4 69
[54] 4 70
[55] 4 71
-------
queryLength: 4 / subjectLength: 71
>
> # Trying with limited numbers of overlaps; empty regions are ignored.
> cons <- consolidateSizes(list(low[1,], med[1,], high[1,]),
+ list(dbl[1,], dbm[1,], dbh[1,]), region=of.interest)
> cons$region
GRanges object with 4 ranges and 0 metadata columns:
seqnames ranges strand
<Rle> <IRanges> <Rle>
[1] chrA [ 1, 200] *
[2] chrA [ 500, 1000] *
[3] chrB [ 100, 700] *
[4] chrC [1000, 1500] *
-------
seqinfo: 3 sequences from an unspecified genome; no seqlengths
> cons$table
nWindows logFC.up logFC.down PValue FDR
1 3 2 1 0.6554466 0.6554466
2 1 1 0 0.4369644 0.6554466
3 NA NA NA NA NA
4 NA NA NA NA NA
>
>
>
>
>
> dev.off()
null device
1
>