A M*M Hi-C matrix,providing pairwise calibrated distances between M genomic bins. Zero off-diagonal values represent discarded/missing information. This matrix could be prepared with the function calibrate.
infD
A numeric value for missing distances (see details).
itermax
An integer value giving the maximal number of iterations for the SMACOF procedure (see details).Set to be 10000 by default.
eps
A numeric value giving the convergence parameter for the SMACOF procedure (see details). Set to be 1e-06 be default.
init
A M*k numeric matrix, giving the initial configuration, for the SMACOF procedure, and set to NULL by default. If NULL, the classical MDS solution is used.
k
The number of dimensions for the output configuration. Set to 3 by default.
verbose
A Boolean indicating whether to print the stress at each iteration.Set to FALSE by default.
infW
A numeric value giving the weight for missing distances. Set to NULL by default. If NULL, this value is set to be 1/infD for infD > 1 and 0.05 otherwise (see details).
Details
A calibrated distance Hi-C matrix H can be used as input for a 3D reconstruction algorithm. It is important to note that if H would give the true 3D Euclidean distances than the multi-dimensional scaling (MDS) solution (Torgerson, 1952) would recover the underlying 3D configuration, up to a rotation. However, due to Hi-C limitation we rely mostly on local calibrated distances. Denote delta(i,j) the calibrated distance between loci i and j, and d(i,j), their Euclidean distance, in the true underlying 3D configuration. Here, a zero delta[i,j] , for different loci i and j, represents a missing information that was discarded in the calibration step. Our goal is then to minimize the following function, usually termed stress in an MDS setting (Kruskal, 1964): SUM(i<j)[w(i,j)(delta(i,j)-d(i,j)(Y))^2] , where w(i,j) are the weights we assign according to the reliability of delta(i,j). Since we mostly rely on local information, we can use here a local stress function (Chen and Buja, 2009), where missing delta(i,j) are replaced with a constant dinf (dinf >> known delta(i,j)) and w(i,j) take the value of 1/dinf for missing distances and 1 otherwise (for dinf equal or smaller than 1, weights of missing distances should be set to a small constant << 1). Since w(i,j) define an irreducible matrix, the stress minimization could be performed through Scaling by Majorizing a Complicated Function (SMACOF) (De Leeuw, 1977; De Leeuw, 1988; De Leeuw and Heiser 1977), a well-established strategy for this task, which guarantees convergence.
Value
A M*k configuration matrix (k=3 by default) reconstructed from the given M*M Hi-C pairwise distances.
Author(s)
Yoli Shavit
References
Main reference:
Y. Shavit, F.K. Hamey, P. Lio', FisHiCal: an R package for iterative FISH-based calibration of Hi-C data, 2014 (submitted).
References for Details section:
Chen, L. and Buja, A. (2009) Local Multidimensional Scaling for Nonlinear Dimension Reduction, Graph Drawing, and Proximity Analysis. Journal of the American Statistical Association 104, 209-219.
De Leeuw, J. (1977). Applications of convex analysis to multidimensional scaling. In Barra, J.R et al (Eds.), Recent developments in statistics, 133-145. Amsterdam, The Netherlands: North-Holland.
De Leeuw, J. (1988). Convergence of the majorization method for multidimensional scaling,. Journal of Classification, 5, 163-180.
De Leeuw, J. and Heiser, W.J. (1977). Convergence of correction-matrix algorithms for multidimensional scaling. In Lingoes, J.C., Roskam, E.E., and Borg, I. (Eds.),Geometric representations of relational data, 735-752. Ann Arbor, MI:Mathesis.
Kruskal, J. B. (1964). Nonmetric multidimensional scaling: A numerical method. Psychometrika, 29, 115-129.
Torgerson, W.S. (1952). Multidimensional scaling: I. Theory and method. Psychometrika, 17, 401-19.
See Also
prepareCalib calibHiC calibrate
Examples
data(calibHiC)
data(match)
data(conf)
predConf = lsmacof(calibHiC, max(match$distances))
# superimpose
partialPS<-function(m1, m2)
{
# translate to origin
tm1<-scale(m1, scale = FALSE)
tm2<-scale(m2, scale = FALSE)
A<-svd(t(tm2)%*%tm1)
v<-A$u
w<-A$v
# update v a det(R) is positive
k = ncol(m1)
d = sign(det(t(w)%*%t(v)))
v[,k] = v[,k]*-1*d
R<- w%*%t(v)
return(list(m1=tm1%*%R,m2=tm2))
}
res = partialPS(predConf, conf)
if (require(rgl))
{
plot3d(res$m2, type = "l", axes = FALSE, box = FALSE, xlab = "", ylab = "", zlab = "")
lines3d(res$m1, col = "red")
}
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(FisHiCal)
Loading required package: igraph
Attaching package: 'igraph'
The following objects are masked from 'package:stats':
decompose, spectrum
The following object is masked from 'package:base':
union
Loading required package: RcppArmadillo
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/FisHiCal/lsmacof.Rd_%03d_medium.png", width=480, height=480)
> ### Name: lsmacof
> ### Title: 3D reconstruction from Hi-C distances
> ### Aliases: lsmacof
>
> ### ** Examples
>
> data(calibHiC)
> data(match)
> data(conf)
> predConf = lsmacof(calibHiC, max(match$distances))
>
> # superimpose
> partialPS<-function(m1, m2)
+ {
+ # translate to origin
+ tm1<-scale(m1, scale = FALSE)
+ tm2<-scale(m2, scale = FALSE)
+ A<-svd(t(tm2)%*%tm1)
+ v<-A$u
+ w<-A$v
+ # update v a det(R) is positive
+ k = ncol(m1)
+ d = sign(det(t(w)%*%t(v)))
+ v[,k] = v[,k]*-1*d
+ R<- w%*%t(v)
+ return(list(m1=tm1%*%R,m2=tm2))
+ }
>
> res = partialPS(predConf, conf)
> if (require(rgl))
+ {
+ plot3d(res$m2, type = "l", axes = FALSE, box = FALSE, xlab = "", ylab = "", zlab = "")
+ lines3d(res$m1, col = "red")
+ }
Loading required package: rgl
>
>
>
>
>
>
> dev.off()
null device
1
>