Last data update: 2014.03.03

R: Cluster DB windows into clusters
clusterWindowsR Documentation

Cluster DB windows into clusters

Description

Clusters significant windows into clusters while controlling the cluster-level FDR.

Usage

clusterWindows(regions, tab, target, pval.col=NULL, 
    fc.col=NA, tol, ..., weight=NULL, grid.param=NULL) 

Arguments

regions

a GRanges or RangedSummarizedExperiment object containing window coordinates

tab

a dataframe of results with a PValue field for each window

target

a numeric scalar indicating the desired cluster-level FDR

pval.col

a string or integer scalar specifying the column of tab with the p-values

fc.col

a string or integer scalar specifying the column of tab with the log-fold changes

tol, ...

arguments to be passed to mergeWindows

weight, grid.param

arguments to be passed to controlClusterFDR

Details

Windows are identified as DB based on the adjusted p-values in tab. Only these DB windows are then used directly for clustering via mergeWindows. This identifies DB regions consisting solely of DB windows. If tol is not specified, it is set to 100 bp by default and a warning is raised. If fc.col is used to specify the column of log-fold changes, clusters are formed according to the sign of the log-fold change in mergeWindows.

DB-based clustering is obviously not blind to the DB status, so standard methods for FDR control are not valid. Instead, post-hoc control of the cluster-level FDR is applied by using controlClusterFDR. This aims to control the cluster-level FDR at target (which is set to 0.05 if not specified). The aim is to provide some interpretable results when DB-blind clustering is not appropriate, e.g., for diffuse marks involving long stretches of the genome. Reporting each marked stretch in its entirety would be cumbersome, so this method allows the DB subintervals to be identified directly.

Value

A named list similar to that reported by mergeWindows with an ID vector in id and region coordinates of each cluster in region. Non-significant windows are marked with NA values in ids. An additional element FDR is also included, representing the estimate of the cluster-level FDR for the returned regions.

Author(s)

Aaron Lun

See Also

mergeWindows, controlClusterFDR

Examples

set.seed(10)
x <- round(runif(100, 100, 1000))
gr <- GRanges("chrA", IRanges(x, x+5))
tab <- data.frame(PValue=rbeta(length(x), 1, 50), logFC=rnorm(length(x)))

clusterWindows(gr, tab, target=0.05, tol=10)
clusterWindows(gr, tab, target=0.05, tol=10, fc.col="logFC")

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(csaw)
Loading required package: GenomicRanges
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/csaw/clusterWindows.Rd_%03d_medium.png", width=480, height=480)
> ### Name: clusterWindows
> ### Title: Cluster DB windows into clusters
> ### Aliases: clusterWindows
> ### Keywords: clustering
> 
> ### ** Examples
> 
> set.seed(10)
> x <- round(runif(100, 100, 1000))
> gr <- GRanges("chrA", IRanges(x, x+5))
> tab <- data.frame(PValue=rbeta(length(x), 1, 50), logFC=rnorm(length(x)))
> 
> clusterWindows(gr, tab, target=0.05, tol=10)
$id
  [1] NA NA  7 11  2  5 NA NA  9  7 10 NA  3  8 NA  7 NA  6  7 NA 16  9 13  7  7
 [26] NA NA NA NA  7  7  2  4 17  7 12 15 18 11  7 NA NA  1 NA  6  4  1  7 NA 14
 [51] NA 18  6 NA  5  8 NA  7  7  7  1  3  7  7 NA NA NA  7  2  5  2  7 NA  7  1
 [76] NA  7 18 NA  5 16  7 NA  9 NA  1 14 NA  4 NA NA 11  8  7  7 15  6 NA 12  1

$region
GRanges object with 18 ranges and 0 metadata columns:
       seqnames     ranges strand
          <Rle>  <IRanges>  <Rle>
   [1]     chrA [113, 140]      *
   [2]     chrA [168, 189]      *
   [3]     chrA [202, 208]      *
   [4]     chrA [245, 258]      *
   [5]     chrA [272, 308]      *
   ...      ...        ...    ...
  [14]     chrA [818, 826]      *
  [15]     chrA [840, 856]      *
  [16]     chrA [875, 883]      *
  [17]     chrA [910, 915]      *
  [18]     chrA [943, 970]      *
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

$FDR
[1] 0.05555556

> clusterWindows(gr, tab, target=0.05, tol=10, fc.col="logFC")
$id
  [1] 39 23 31 48  8 16 NA NA 46 31 47 NA 10 45 26 31  5 22 28 54 55 46 52 25 29
 [26] 49 54 17 51 26 41  9 13 56 30 51 54 58 48 37 NA 16  1 50 20 12  2 35 NA 53
 [51] 24 57 18 34 14 44 NA 33 28 38  4 11 33 28 54 51 43 31  7 15  6 40 NA 40  4
 [76] 42 26 58 21 15 55 32 16 46 15  2 53 22 12 NA 36 50 45 39 27 54 19 42 50  3

$region
GRanges object with 58 ranges and 0 metadata columns:
       seqnames     ranges strand
          <Rle>  <IRanges>  <Rle>
   [1]     chrA [113, 118]      *
   [2]     chrA [115, 123]      *
   [3]     chrA [127, 132]      *
   [4]     chrA [129, 140]      *
   [5]     chrA [147, 152]      *
   ...      ...        ...    ...
  [54]     chrA [840, 859]      *
  [55]     chrA [875, 883]      *
  [56]     chrA [910, 915]      *
  [57]     chrA [943, 948]      *
  [58]     chrA [959, 970]      *
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths

$FDR
[1] 0.05172414

> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>