R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Count connecting read pairs

connectCounts

R Documentation

Count connecting read pairs

Description

Count the number of read pairs connecting pairs of user-specified regions

Usage

 
connectCounts(files, param, regions, filter=1L, type="any", second.regions=NULL)

Arguments

`files`	a character vector containing the paths to the count file for each library
`param`	a `pairParam` object containing read extraction parameters
`regions`	a GRanges object specifying the regions between which read pairs should be counted
`filter`	an integer scalar specifying the minimum count for each interaction
`type`	a character string specifying how restriction fragments should be assigned to regions
`second.regions`	a GRanges object containing the second regions of interest, or an integer scalar specifying the bin size

Details

Interactions of interest are defined as those formed by pairs of elements in regions. The number of read pairs connecting each pair of elements can then be counted in each library. This can be useful for quantifying/summarizing interactions between genomic features, e.g., promoters or gene bodies.

For a pair of intervals in regions, the interaction count is defined as the number of read pairs with one read in each interval (after rounding each interval to a fragment; see below). To avoid reporting weak interactions, pairs can be filtered to retain only those with a count sum across all libraries above filter. In each pair, the anchor interval is defined as that with the higher start position. Note that the end position may not be higher, due to the possibility of nested intervals in regions.

The value of type feeds into findOverlaps and controls the manner in which restriction fragments are assigned to each region. By default, a restriction fragment is assigned to one or more regions if said fragment overlaps with any part of those regions. This means that the boundaries of each region are expanded outwards to obtain the effective coordinates in the output region. In contrast, setting type="within" would contract each region inwards.

The modified regions can be extracted from the regions slot in the output InteractionSet object. These will be reordered according to the new start positions. The ordering permutation can be recovered from the original metadata field of the GRanges object. Similarly, the number of restriction fragments assigned to each interval is stored in the nfrags metadata field.

Counting will consider the values of restrict, discard and cap in param. See pairParam for more details.

Value

An InteractionSet is returned, specifying the number of read pairs in each library that are mapped between pairs of regions, or between regions and second.regions. Interacting regions are returned as a ReverseStrictGInteractions object containing the concatenated regions and second.regions.

Matching to a second set of regions

The second.regions argument allows specification of a second set of regions, where interactions are only considered between one entry in regions and one entry in second.regions. This differs from supplying all regions to regions, which would consider all pairwise interactions between regions regardless of whether they belong in the first or second set. If an integer scalar is supplied as second.regions, this value is used as a width to partition the genome into bins. These bins are then used as the set of second regions.

Specification of second.regions is useful for efficiently identifying interactions between two sets of regions. For example, the first set can be set to several “viewpoint” regions of interest. This is similar to the bait region in 4C-seq, or the captured regions in Capture Hi-C. Interactions between these viewpoints and the rest of the genome can then be examined by setting second.regions to some appropriate bin size.

The output InteractionSet will merge all regions and second.regions into a single GRanges object. However, those in the second set can be distinguished with the is.second metadata field. Each original index will also point towards the corresponding entry in the original second.regions when is.second=TRUE. Similarly, if is.second=FALSE, the index will point towards the corresponding entry in the original regions.

Note that this function does not guarantee that the second set of regions will be treated as the second anchor region (or the first) for each interaction. Those definitions are dependent on the sorting order of the coordinates for all regions. Users should use the is.second field to identify the region from the second set in each interaction.

Author(s)

Aaron Lun

Examples

hic.file <- system.file("exdata", "hic_sort.bam", package="diffHic")
cuts <- readRDS(system.file("exdata", "cuts.rds", package="diffHic"))
param <- pairParam(cuts)

# Setting up the parameters
fout <- "output"
invisible(preparePairs(hic.file, param, fout))
regions <- suppressWarnings(c(
    GRanges("chrA", IRanges(c(1, 100, 150), c(20, 140, 160))), 
    GRanges("chrB", IRanges(50, 100))))

# Collating to count combinations.
con <- connectCounts(fout, param, regions=regions, filter=1L)
head(assay(con))
con <- connectCounts(fout, param, regions=regions, filter=1L, type="within")
head(assay(con))

# Still works with restriction and other parameters.
con <- connectCounts(fout, param=reform(param, restrict="chrA"), 
    regions=regions, filter=1L)
head(assay(con))
con <- connectCounts(fout, param=reform(param, discard=GRanges("chrA", IRanges(1, 50))),
    regions=regions, filter=1L)
head(assay(con))
con <- connectCounts(fout, param=reform(param, cap=1), regions=regions, filter=1L)
head(assay(con))

# Specifying a second region.
regions2 <- suppressWarnings(c(
    GRanges("chrA", IRanges(c(50, 100), c(100, 200))), 
    GRanges("chrB", IRanges(1, 50))))

con <- connectCounts(fout, param, regions=regions, filter=1L, second.region=regions2)
head(anchors(con, type="first"))
head(anchors(con, type="second"))
con <- connectCounts(fout, param, regions=regions, filter=1L, second.region=50)
head(anchors(con, type="first"))
head(anchors(con, type="second"))

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(diffHic)
Loading required package: GenomicRanges
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: InteractionSet
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/diffHic/connectCounts.Rd_%03d_medium.png", width=480, height=480)
> ### Name: connectCounts
> ### Title: Count connecting read pairs
> ### Aliases: connectCounts
> ### Keywords: counting
> 
> ### ** Examples
> 
> hic.file <- system.file("exdata", "hic_sort.bam", package="diffHic")
> cuts <- readRDS(system.file("exdata", "cuts.rds", package="diffHic"))
> param <- pairParam(cuts)
> 
> # Setting up the parameters
> fout <- "output"
> invisible(preparePairs(hic.file, param, fout))
> regions <- suppressWarnings(c(
+     GRanges("chrA", IRanges(c(1, 100, 150), c(20, 140, 160))), 
+     GRanges("chrB", IRanges(50, 100))))
> 
> # Collating to count combinations.
> con <- connectCounts(fout, param, regions=regions, filter=1L)
> head(assay(con))
     [,1]
[1,]    1
[2,]    1
[3,]    1
[4,]    2
[5,]    2
> con <- connectCounts(fout, param, regions=regions, filter=1L, type="within")
> head(assay(con))
     [,1]
[1,]    1
> 
> # Still works with restriction and other parameters.
> con <- connectCounts(fout, param=reform(param, restrict="chrA"), 
+     regions=regions, filter=1L)
> head(assay(con))
     [,1]
[1,]    1
[2,]    1
> con <- connectCounts(fout, param=reform(param, discard=GRanges("chrA", IRanges(1, 50))),
+     regions=regions, filter=1L)
> head(assay(con))
     [,1]
[1,]    1
[2,]    2
[3,]    2
> con <- connectCounts(fout, param=reform(param, cap=1), regions=regions, filter=1L)
> head(assay(con))
     [,1]
[1,]    1
[2,]    1
[3,]    1
[4,]    1
[5,]    2
> 
> # Specifying a second region.
> regions2 <- suppressWarnings(c(
+     GRanges("chrA", IRanges(c(50, 100), c(100, 200))), 
+     GRanges("chrB", IRanges(1, 50))))
> 
> con <- connectCounts(fout, param, regions=regions, filter=1L, second.region=regions2)
> head(anchors(con, type="first"))
GRanges object with 6 ranges and 3 metadata columns:
      seqnames     ranges strand |    nfrags is.second  original
         <Rle>  <IRanges>  <Rle> | <integer> <logical> <integer>
  [1]     chrA [ 49, 140]      * |         2         1         1
  [2]     chrA [ 95, 188]      * |         2         1         2
  [3]     chrA [ 95, 188]      * |         2         1         2
  [4]     chrA [141, 188]      * |         1         0         3
  [5]     chrA [141, 188]      * |         1         0         3
  [6]     chrB [  1,  69]      * |         2         1         3
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths
> head(anchors(con, type="second"))
GRanges object with 6 ranges and 3 metadata columns:
      seqnames    ranges strand |    nfrags is.second  original
         <Rle> <IRanges>  <Rle> | <integer> <logical> <integer>
  [1]     chrA [ 1,  48]      * |         1         0         1
  [2]     chrA [ 1,  48]      * |         1         0         1
  [3]     chrA [95, 140]      * |         1         0         2
  [4]     chrA [49, 140]      * |         2         1         1
  [5]     chrA [95, 188]      * |         2         1         2
  [6]     chrA [ 1,  48]      * |         1         0         1
  -------
  seqinfo: 2 sequences from an unspecified genome; no seqlengths
> con <- connectCounts(fout, param, regions=regions, filter=1L, second.region=50)
> head(anchors(con, type="first"))
GRanges object with 6 ranges and 3 metadata columns:
      seqnames     ranges strand |    nfrags is.second  original
         <Rle>  <IRanges>  <Rle> | <integer> <logical> <integer>
  [1]     chrA [ 49,  94]      * |         1         1         2
  [2]     chrA [141, 188]      * |         1         0         3
  [3]     chrA [141, 188]      * |         1         0         3
  [4]     chrA [141, 188]      * |         1         0         3
  [5]     chrA [141, 188]      * |         1         1         4
  [6]     chrA [141, 188]      * |         1         1         4
  -------
  seqinfo: 2 sequences from an unspecified genome
> head(anchors(con, type="second"))
GRanges object with 6 ranges and 3 metadata columns:
      seqnames    ranges strand |    nfrags is.second  original
         <Rle> <IRanges>  <Rle> | <integer> <logical> <integer>
  [1]     chrA [ 1,  48]      * |         1         0         1
  [2]     chrA [ 1,  48]      * |         1         1         1
  [3]     chrA [49,  94]      * |         1         1         2
  [4]     chrA [95, 140]      * |         1         1         3
  [5]     chrA [ 1,  48]      * |         1         0         1
  [6]     chrA [95, 140]      * |         1         0         2
  -------
  seqinfo: 2 sequences from an unspecified genome
> 
> ## Don't show: 
> unlink(fout)
> ## End(Don't show)
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>