This function performs reproducibility (or meta) analysis using GMCMs.
It features various optimization routines to identify the maximum likelihood
estimate of the special Gaussian mixture copula model proposed by
Li et. al. (2011).
An n by d matrix of test statistics. Rows correspond
to features and columns to experiments. Large values are assumed to be
indicative of reproducible genes.
init.par
A 4-dimensional vector of the initial parameters where,
init.par[1] is the mixture proportion of spurious signals,
init.par[2] is the mean, init.par[3] is the standard
deviation, init.par[4] is the correlation.
method
A character vector of length 1. The optimization
method used. Should be either "NM", "SANN", "L-BFGS",
"L-BFGS-B", or "PEM" which are abbreviations of Nelder-Mead,
Simulated Annealing, limited-memory quasi-Newton method, limited-memory
quasi-Newton method with box constraints, and the pseudo EM algorithm,
respectively. Default is "NM". See optim for further
details.
max.ite
The maximum number of iterations. If the method is
"SANN" this is the number of iterations as there is no other
stopping criterion. (See optim)
verbose
Logical. If TRUE, the log-likelihood values are
printed.
positive.rho
logical. If TRUE, the correlation parameter
is restricted to be positive.
trace.theta
logical. Extra convergence information is appended
as a list to the output returned if TRUE. The exact behavior is
dependent on the value of method. If method equals
"PEM", the argument is passed to trace.theta in
PseudoEMAlgorithm. Otherwise it is passed to the control
argument trace in optim.
...
Arguments passed to the control-list in
optim or PseudoEMAlgorithm if method is
"PEM".
Details
The "L-BFGS-B" method does not perform a transformation of the
parameters.
Value
A vector par of length 4 of the fitted parameters where
par[1] is the probability of being from the first (or null)
component, par[2] is the mean, par[3] is the standard
deviation, and par[4] is the correlation.
Note
Simulated annealing is strongly dependent on the initial values and
the cooling scheme.
See optim for further details.
Author(s)
Anders Ellern Bilgrau <anders.ellern.bilgrau@gmail.com>
References
Li, Q., Brown, J. B. J. B., Huang, H., & Bickel, P. J. (2011).
Measuring reproducibility of high-throughput experiments. The Annals of
Applied Statistics, 5(3), 1752-1779. doi:10.1214/11-AOAS466
See Also
optim
Examples
set.seed(1)
# True parameters
true.par <- c(0.9, 2, 0.7, 0.6)
# Simulation of data from the GMCM model
data <- SimulateGMCMData(n = 1000, par = true.par)
uhat <- Uhat(data$u) # Ranked observed data
init.par <- c(0.5, 1, 0.5, 0.9) # Initial parameters
# Optimization with Nelder-Mead
nm.par <- fit.meta.GMCM(uhat, init.par = init.par, method = "NM")
## Not run:
# Comparison with other optimization methods
# Optimization with simulated annealing
sann.par <- fit.meta.GMCM(uhat, init.par = init.par, method = "SANN",
max.ite = 3000, temp = 1)
# Optimization with the Pseudo EM algorithm
pem.par <- fit.meta.GMCM(uhat, init.par = init.par, method = "PEM")
# The estimates agree nicely
rbind("True" = true.par, "Start" = init.par,
"NM" = nm.par, "SANN" = sann.par, "PEM" = pem.par)
## End(Not run)
# Get estimated cluster
Khat <- get.IDR(x = uhat, par = nm.par)$Khat
plot(uhat, col = Khat, main = "Clustering\nIDR < 0.05")