Last data update: 2014.03.03

R: Prepare Hi-C calibration
prepareCalibR Documentation

Prepare Hi-C calibration

Description

A function to build a calibration function, by fitting a subset of FISH distances and Hi-C frequencies with a power law model (see details). The number of distances to fit (taking distances by increasing order) or a subset of selected distances should be provided by the user. Users can also choose how to estimate the distance threshold or may explicitly provide one.

Usage

prepareCalib(data, npoints, threshold = NULL, useMax = TRUE, delta = 0.05, buffer = 1.0)

Arguments

data

A data frame with 2 mandatory columns: distances and frequencies, standing for matching FISH distances and Hi-C frequencies, correspondingly. This data structure could be prepared with prepareData

npoints

An integer or an integer vector. If an integer n is given, than the shortest n distances and their matching frequencies will be used. Otherwise, the indices in the integer vector will indicate the subset of distances and frequencies to use from 'data'.

threshold

Optional numeric, set to NULL by default. If provided, will be used as the distance range threshold of the calibration.

useMax

Optional Boolean, set to True by default and ignored if 'threshold' is given. When TRUE, the maximal provided FISH distance will be used for the distance range threshold. Otherwise, the threshold will be estimated by the maximal FISH distance that present a small enough deviation (< delta) from the model.

delta

Optional numeric, set to 0.05 by default and ignored if 'threshold' is given or if 'useMax' is set to TRUE. Defines the acceptable deviation from the model, when the distance range threshold is estimated from the fit (see details).

buffer

Optional numeric, set to 1.0 by default and ignored if 'useMax' is set to FALSE. Defines a constant that is added to the threshold value when 'useMax' is set to TRUE.

Details

We use a power law model to relate a set of FISH distances, D, and a matching set of contact frequencies, C: C ~ βD^α
Taking the log of this equation gives a linear dependency: log(C)~ log(β) + αlog(D)
Here, we consider only a subset of distances for solving the latter equation and estimate alpha and beta with a linear regression. The threshold t, defining the range limit of Hi-C (a distance above which Hi-C frequencies are no longer informative) could be set to the maximal distance in D, or estimated more restrictively from the fit:
t = maxD{ | e^( (log(C)-log(β))/α ) - D |< δ }

Value

A list with the following objects:

calib

a list defining the calibration, with the following objects: f - the calibration function (the power law model), and params - a list of parameters for f (the parameters of the model and the threshold).

fit

the return value of lm, used to solve the linear regression

Author(s)

Yoli Shavit

References

Y. Shavit, F.K. Hamey, P. Lio', FisHiCal: an R package for iterative FISH-based calibration of Hi-C data, 2014 (submitted).

See Also

prepareData
calibrate
conf
hic
match

Examples

 
 data(match) 
 npoints = 10 # number of points to fit
 
 # prepareCalib computes threshold according to the fit 
 # useMax is set to FALSE
 res = prepareCalib(match, npoints, useMax = FALSE)
 calib = res$calib
 calib
 fit = res$fit
 alpha = calib$params[[1]]
 beta = calib$params[[2]]
 threshold = calib$params[[3]]
 
 # plot
 plot(match$frequencies ~ match$distances, xlab = "distances", 
 ylab = "frequencies")
 
 lines((exp(beta)*match$distances^alpha)~match$distances, 
 col = "red")
 
 plot(log(match$frequencies) ~ log(match$distances), 
 xlab = "log(distances)", ylab = "log(frequencies)")
 abline(fit, col = "red")
 
 # plot the estimated threshold 
 abline(h = beta + log(threshold)*alpha, lty = 3)
             

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(FisHiCal)
Loading required package: igraph

Attaching package: 'igraph'

The following objects are masked from 'package:stats':

    decompose, spectrum

The following object is masked from 'package:base':

    union

Loading required package: RcppArmadillo
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/FisHiCal/prepareCalib.Rd_%03d_medium.png", width=480, height=480)
> ### Name: prepareCalib
> ### Title: Prepare Hi-C calibration
> ### Aliases: prepareCalib
> 
> ### ** Examples
>  
>  data(match) 
>  npoints = 10 # number of points to fit
>  
>  # prepareCalib computes threshold according to the fit 
>  # useMax is set to FALSE
>  res = prepareCalib(match, npoints, useMax = FALSE)
>  calib = res$calib
>  calib
$f
function (m, params) 
{
    a = params[[1]]
    b = params[[2]]
    threshold = params[[3]]
    m = exp((log(m) - b)/a)
    m[m == Inf] = 0
    m[m > threshold] = 0
    return(m)
}
<environment: 0x2325988>

$params
$params[[1]]
[1] -2

$params[[2]]
[1] 10

$params[[3]]
[1] 22.79646


>  fit = res$fit
>  alpha = calib$params[[1]]
>  beta = calib$params[[2]]
>  threshold = calib$params[[3]]
>  
>  # plot
>  plot(match$frequencies ~ match$distances, xlab = "distances", 
+  ylab = "frequencies")
>  
>  lines((exp(beta)*match$distances^alpha)~match$distances, 
+  col = "red")
>  
>  plot(log(match$frequencies) ~ log(match$distances), 
+  xlab = "log(distances)", ylab = "log(frequencies)")
>  abline(fit, col = "red")
>  
>  # plot the estimated threshold 
>  abline(h = beta + log(threshold)*alpha, lty = 3)
>              
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>