Last data update: 2014.03.03

R: Cluster bin pairs
clusterPairsR Documentation

Cluster bin pairs

Description

Aggregate bin pairs into local clusters for summarization.

Usage

clusterPairs(..., tol, upper=1e6, index.only=FALSE)

Arguments

...

One or more InteractionSet objects, optionally named.

tol

A numeric scalar specifying the maximum distance between bin pairs in base pairs.

upper

A numeric scalar specifying the maximum size of each cluster in base pairs.

index.only

A logical scalar indicating whether only indices should be returned.

Details

Clustering is performed by putting a interaction in a cluster if the smallest Chebyshev distance to any interaction already inside the cluster is less than tol. This is a cross between single-linkage approaches and density-based methods, especially after filtering removes low-density regions. In this manner, adjacent events in the interaction space can be clustered together. Interactions that are assigned with the same cluster ID belong to the same cluster.

The input data objects can be taken from the output of squareCounts or connectCounts. For the former, inputs can consist of interactions with multiple bin sizes. It would be prudent to filter the former based on the average abundances, to reduce the density of bin pairs in the interaction space. Otherwise, clusters may be too large to be easily interpreted.

Alternatively, to avoid excessively large clusters, this function can also split each cluster into roughly-equally sized subclusters. The maximum value of any dimension of the subclusters is approxiamtely equal to upper. This aims to improve the spatial interpretability of the clustering result.

There is no guarantee that each cluster forms a regular shape in the interaction space. Instead, a minimum bounding box is reported containing all bin pairs in each cluster. The coordinates of the box for each cluster is stored in each row of the output interactions. The cluster ID in each indices vector represents the row index for these coordinates.

If index.only=TRUE, only the indices are returned and coordinates of the bounding box are not computed. This is largely for efficiency purposes when clusterPairs is called by internal functions.

Value

If index.only=FALSE, a named list is returned containing:

indices:

A named list of integer vectors where each vector contains a cluster ID for each interaction in the corresponding input InteractionSet object.

interactions:

A ReverseStrictGInteractions object containing the coordinates of the sides of the bounding box for each cluster.

If index.only=TRUE, the indices are returned directly without computing coordinates.

Author(s)

Aaron Lun

See Also

squareCounts, diClusters, boxPairs

Examples

# Setting up the object.
a <- 10
b <- 20
regions <- GRanges(rep(c("chrA", "chrB"), c(a, b)), IRanges(c(1:a, 1:b), c(1:a, 1:b)))

set.seed(3423)
all.anchor1 <- sample(length(regions), 50, replace=TRUE)
all.anchor2 <- as.integer(runif(50, 1, all.anchor1+1))
y <- InteractionSet(matrix(0, 50, 1), 
    GInteractions(anchor1=all.anchor1, anchor2=all.anchor2, regions=regions, mode="reverse"),
    colData=DataFrame(lib.size=1000), metadata=List(width=1))

# Clustering; note, small tolerances are used in this toy example.
clusterPairs(y, tol=1)
clusterPairs(y, tol=3)
clusterPairs(y, tol=5)
clusterPairs(y, tol=5, upper=5)

# Multiple bin sizes allowed.
a2 <- a/2
b2 <- b/2
rep.regions <- GRanges(rep(c("chrA", "chrB"), c(a2, b2)), 
    IRanges(c(1:a2*2, 1:b2*2), c(1:a2*2, 1:b2*2)))
rep.anchor1 <- sample(length(rep.regions), 10, replace=TRUE)
rep.anchor2 <- as.integer(runif(10, 1, rep.anchor1+1))
y2 <- InteractionSet(matrix(0, 10, 1), 
    GInteractions(anchor1=rep.anchor1, anchor2=rep.anchor2, regions=rep.regions, mode="reverse"), 
    colData=DataFrame(lib.size=1000), metadata=List(width=2))

clusterPairs(y, y2, tol=1)
clusterPairs(y, y2, tol=3)
clusterPairs(y, y2, tol=5)
clusterPairs(y, tol=5, upper=5)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(diffHic)
Loading required package: GenomicRanges
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: InteractionSet
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/diffHic/clusterPairs.Rd_%03d_medium.png", width=480, height=480)
> ### Name: clusterPairs
> ### Title: Cluster bin pairs
> ### Aliases: clusterPairs
> ### Keywords: clustering
> 
> ### ** Examples
> 
> # Setting up the object.
> a <- 10
> b <- 20
> regions <- GRanges(rep(c("chrA", "chrB"), c(a, b)), IRanges(c(1:a, 1:b), c(1:a, 1:b)))
> 
> set.seed(3423)
> all.anchor1 <- sample(length(regions), 50, replace=TRUE)
> all.anchor2 <- as.integer(runif(50, 1, all.anchor1+1))
> y <- InteractionSet(matrix(0, 50, 1), 
+     GInteractions(anchor1=all.anchor1, anchor2=all.anchor2, regions=regions, mode="reverse"),
+     colData=DataFrame(lib.size=1000), metadata=List(width=1))
> 
> # Clustering; note, small tolerances are used in this toy example.
> clusterPairs(y, tol=1)
$indices
$indices[[1]]
 [1] 32  8  2 23  7  1  5 31 28 11  1 31 15 25 20 22  2 26  2  5 24  4  9  3 18
[26]  5  2 14 19 26  8  1  1 26  4 17 26 12 27 29 31 21 30 13  6  1 10 16  7 15


$interactions
ReverseStrictGInteractions object with 32 interactions and 0 metadata columns:
       seqnames1   ranges1     seqnames2   ranges2
           <Rle> <IRanges>         <Rle> <IRanges>
   [1]      chrA   [1,  3] ---      chrA    [1, 1]
   [2]      chrA   [3,  5] ---      chrA    [3, 5]
   [3]      chrA   [5,  5] ---      chrA    [1, 1]
   [4]      chrA   [7,  7] ---      chrA    [2, 2]
   [5]      chrA   [8, 10] ---      chrA    [3, 5]
   ...       ...       ... ...       ...       ...
  [28]      chrB  [12, 12] ---      chrB  [ 3,  3]
  [29]      chrB  [14, 14] ---      chrB  [ 6,  6]
  [30]      chrB  [17, 17] ---      chrB  [ 3,  3]
  [31]      chrB  [18, 20] ---      chrB  [ 6,  7]
  [32]      chrB  [20, 20] ---      chrB  [13, 13]
  -------
  regions: 33 ranges and 0 metadata columns
  seqinfo: 2 sequences from an unspecified genome; no seqlengths

> clusterPairs(y, tol=3)
$indices
$indices[[1]]
 [1] 9 3 1 4 2 1 1 7 7 5 1 7 4 7 4 4 1 7 1 1 8 1 3 1 4 1 1 6 4 7 3 1 1 7 1 4 7 3
[39] 7 7 7 4 7 4 1 1 3 4 2 4


$interactions
ReverseStrictGInteractions object with 9 interactions and 0 metadata columns:
      seqnames1   ranges1     seqnames2   ranges2
          <Rle> <IRanges>         <Rle> <IRanges>
  [1]      chrA  [ 1, 10] ---      chrA  [ 1,  5]
  [2]      chrA  [ 9, 10] ---      chrA  [ 9, 10]
  [3]      chrB  [ 2,  6] ---      chrA  [ 5, 10]
  [4]      chrB  [10, 17] ---      chrA  [ 1,  9]
  [5]      chrB  [ 6,  6] ---      chrA  [ 2,  2]
  [6]      chrB  [10, 10] ---      chrA  [10, 10]
  [7]      chrB  [ 6, 20] ---      chrB  [ 3, 10]
  [8]      chrB  [ 2,  2] ---      chrB  [ 1,  1]
  [9]      chrB  [20, 20] ---      chrB  [13, 13]
  -------
  regions: 17 ranges and 0 metadata columns
  seqinfo: 2 sequences from an unspecified genome; no seqlengths

> clusterPairs(y, tol=5)
$indices
$indices[[1]]
 [1] 4 2 1 2 1 1 1 3 3 2 1 3 2 3 2 2 1 3 1 1 3 1 2 1 2 1 1 2 2 3 2 1 1 3 1 2 3 2
[39] 3 3 3 2 3 2 1 1 2 2 1 2


$interactions
ReverseStrictGInteractions object with 4 interactions and 0 metadata columns:
      seqnames1   ranges1     seqnames2   ranges2
          <Rle> <IRanges>         <Rle> <IRanges>
  [1]      chrA  [ 1, 10] ---      chrA  [ 1, 10]
  [2]      chrB  [ 2, 17] ---      chrA  [ 1, 10]
  [3]      chrB  [ 2, 20] ---      chrB  [ 1, 10]
  [4]      chrB  [20, 20] ---      chrB  [13, 13]
  -------
  regions: 6 ranges and 0 metadata columns
  seqinfo: 2 sequences from an unspecified genome; no seqlengths

> clusterPairs(y, tol=5, upper=5)
$indices
$indices[[1]]
 [1] 17  5  1  9  3  1  2 16 13  4  1 16  6 11  9  8  1 12  1  2 10  2  5  1  8
[26]  2  1  7  9 12  4  1  1 12  2  8 12  5 14 14 16  8 15  6  2  1  5  7  3  6


$interactions
ReverseStrictGInteractions object with 17 interactions and 0 metadata columns:
       seqnames1   ranges1     seqnames2   ranges2
           <Rle> <IRanges>         <Rle> <IRanges>
   [1]      chrA   [1,  5] ---      chrA   [1,  5]
   [2]      chrA   [7, 10] ---      chrA   [1,  5]
   [3]      chrA   [9, 10] ---      chrA   [9, 10]
   [4]      chrB   [2,  6] ---      chrA   [2,  5]
   [5]      chrB   [2,  6] ---      chrA   [6, 10]
   ...       ...       ... ...       ...       ...
  [13]      chrB  [12, 12] ---      chrB  [ 3,  3]
  [14]      chrB  [11, 14] ---      chrB  [ 6, 10]
  [15]      chrB  [17, 17] ---      chrB  [ 3,  3]
  [16]      chrB  [18, 20] ---      chrB  [ 6,  7]
  [17]      chrB  [20, 20] ---      chrB  [13, 13]
  -------
  regions: 25 ranges and 0 metadata columns
  seqinfo: 2 sequences from an unspecified genome; no seqlengths

> 
> # Multiple bin sizes allowed.
> a2 <- a/2
> b2 <- b/2
> rep.regions <- GRanges(rep(c("chrA", "chrB"), c(a2, b2)), 
+     IRanges(c(1:a2*2, 1:b2*2), c(1:a2*2, 1:b2*2)))
> rep.anchor1 <- sample(length(rep.regions), 10, replace=TRUE)
> rep.anchor2 <- as.integer(runif(10, 1, rep.anchor1+1))
> y2 <- InteractionSet(matrix(0, 10, 1), 
+     GInteractions(anchor1=rep.anchor1, anchor2=rep.anchor2, regions=rep.regions, mode="reverse"), 
+     colData=DataFrame(lib.size=1000), metadata=List(width=2))
> 
> clusterPairs(y, y2, tol=1)
$indices
$indices[[1]]
 [1] 33  6  1 23  5  1  3 32 29  9  1 32 14 25 21 22  1 26  1  3 24  2  7  1 18
[26]  3  1 13 19 26  6  1  1 26  2 17 26 10 28 30 32 20 31 12  4  1  8 15  5 14

$indices[[2]]
 [1] 16 24 11  1 20  1  1 27 26  1


$interactions
ReverseStrictGInteractions object with 33 interactions and 0 metadata columns:
       seqnames1   ranges1     seqnames2   ranges2
           <Rle> <IRanges>         <Rle> <IRanges>
   [1]      chrA   [1,  5] ---      chrA   [1,  5]
   [2]      chrA   [7,  7] ---      chrA   [2,  2]
   [3]      chrA   [8, 10] ---      chrA   [3,  5]
   [4]      chrA   [9,  9] ---      chrA   [1,  1]
   [5]      chrA   [9, 10] ---      chrA   [9, 10]
   ...       ...       ... ...       ...       ...
  [29]      chrB  [12, 12] ---      chrB  [ 3,  3]
  [30]      chrB  [14, 14] ---      chrB  [ 6,  6]
  [31]      chrB  [17, 17] ---      chrB  [ 3,  3]
  [32]      chrB  [18, 20] ---      chrB  [ 6,  7]
  [33]      chrB  [20, 20] ---      chrB  [13, 13]
  -------
  regions: 36 ranges and 0 metadata columns
  seqinfo: 2 sequences from an unspecified genome; no seqlengths

> clusterPairs(y, y2, tol=3)
$indices
$indices[[1]]
 [1] 7 3 1 3 2 1 1 5 5 4 1 5 3 5 3 3 1 5 1 1 6 1 3 1 3 1 1 3 3 5 3 1 1 5 1 3 5 3
[39] 5 5 5 3 5 3 1 1 3 3 2 3

$indices[[2]]
 [1] 3 6 3 1 3 1 1 5 5 1


$interactions
ReverseStrictGInteractions object with 7 interactions and 0 metadata columns:
      seqnames1   ranges1     seqnames2   ranges2
          <Rle> <IRanges>         <Rle> <IRanges>
  [1]      chrA  [ 1, 10] ---      chrA  [ 1,  5]
  [2]      chrA  [ 9, 10] ---      chrA  [ 9, 10]
  [3]      chrB  [ 2, 17] ---      chrA  [ 1, 10]
  [4]      chrB  [ 6,  6] ---      chrA  [ 2,  2]
  [5]      chrB  [ 6, 20] ---      chrB  [ 3, 10]
  [6]      chrB  [ 2,  2] ---      chrB  [ 1,  2]
  [7]      chrB  [20, 20] ---      chrB  [13, 13]
  -------
  regions: 12 ranges and 0 metadata columns
  seqinfo: 2 sequences from an unspecified genome; no seqlengths

> clusterPairs(y, y2, tol=5)
$indices
$indices[[1]]
 [1] 4 2 1 2 1 1 1 3 3 2 1 3 2 3 2 2 1 3 1 1 3 1 2 1 2 1 1 2 2 3 2 1 1 3 1 2 3 2
[39] 3 3 3 2 3 2 1 1 2 2 1 2

$indices[[2]]
 [1] 2 3 2 1 2 1 1 3 3 1


$interactions
ReverseStrictGInteractions object with 4 interactions and 0 metadata columns:
      seqnames1   ranges1     seqnames2   ranges2
          <Rle> <IRanges>         <Rle> <IRanges>
  [1]      chrA  [ 1, 10] ---      chrA  [ 1, 10]
  [2]      chrB  [ 2, 17] ---      chrA  [ 1, 10]
  [3]      chrB  [ 2, 20] ---      chrB  [ 1, 10]
  [4]      chrB  [20, 20] ---      chrB  [13, 13]
  -------
  regions: 6 ranges and 0 metadata columns
  seqinfo: 2 sequences from an unspecified genome; no seqlengths

> clusterPairs(y, tol=5, upper=5)
$indices
$indices[[1]]
 [1] 17  5  1  9  3  1  2 16 13  4  1 16  6 11  9  8  1 12  1  2 10  2  5  1  8
[26]  2  1  7  9 12  4  1  1 12  2  8 12  5 14 14 16  8 15  6  2  1  5  7  3  6


$interactions
ReverseStrictGInteractions object with 17 interactions and 0 metadata columns:
       seqnames1   ranges1     seqnames2   ranges2
           <Rle> <IRanges>         <Rle> <IRanges>
   [1]      chrA   [1,  5] ---      chrA   [1,  5]
   [2]      chrA   [7, 10] ---      chrA   [1,  5]
   [3]      chrA   [9, 10] ---      chrA   [9, 10]
   [4]      chrB   [2,  6] ---      chrA   [2,  5]
   [5]      chrB   [2,  6] ---      chrA   [6, 10]
   ...       ...       ... ...       ...       ...
  [13]      chrB  [12, 12] ---      chrB  [ 3,  3]
  [14]      chrB  [11, 14] ---      chrB  [ 6, 10]
  [15]      chrB  [17, 17] ---      chrB  [ 3,  3]
  [16]      chrB  [18, 20] ---      chrB  [ 6,  7]
  [17]      chrB  [20, 20] ---      chrB  [13, 13]
  -------
  regions: 25 ranges and 0 metadata columns
  seqinfo: 2 sequences from an unspecified genome; no seqlengths

> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>