R: Expectation conditional maximization of negative binomial HMM...
nbh_em
R Documentation
Expectation conditional maximization of negative binomial HMM parameters using forward-backward algorithm
Description
Given an input read count vector of integers, the function optimizes the parameters for the negative binomial HMM of K hidden states using expectation conditional maximization with forward-backward algorithm to acheive the exact inference.
A vector of integers, conceptaully representative of the read counts within bins of chromosome.
TRANS
Transition probability matrix, a squared matrix of probabilities (0 ≤ p ≤ 1) with row and column length equal to that of alpha and beta and row sum and column sum both equal to 1 (within some numerical deviation of 1e-6).
alpha
Shape parameter of the NB as a vector of positive values with length equal to that of beta and the row/column of TRANS.
beta
Inverse scale parameter of the NB as a vector of positive values with length equal to that of beta and the row/column of TRANS.
NBH_NIT_MAX
Maximum number of EM iterations (Default: 250) for the negative binomial hidden Markov model (NBH).
NBH_TOL
Threshold as fraction of increase in likelihood (given the current NBH parameters) comparing with the likelihood from the last iteration. EM for the NBH stops when the improvement is below the threshold (Default: 0.001).
MAXALPHA
The maximum value of alpha in case the update goes beyond the numerical upper limit of the system. Once alpha becomes larger than MAXALPHA, the EM itaration is prematurely terminated to prevent malfunction.
MAXBETA
The maximum value of beta in case the update goes beyond the numerical upper limit of the system. Once beta becomes larger than MAXBETA, the EM itaration is prematurely terminated to prevent malfunction.
Details
Given a K-state HMM with NB emission (NBH), the goal is to maximize the likelihood function with respect to the parameters comprising of α_k and β_k for the K NB components and the transition probabilities A_jk between any state j and k, which are the priors p(z=k). Because there is no analytical solution for the maximum likelihood (ML) estimators of the above quantities, a modified EM procedures called Expectation Conditional Maximization is employed (Meng and Rubin, 1994).
In E-step, the posterior probability is evaluated by forward-backward algorithm using NB density functions with initialized alpha, beta, and TRANS. In the CM step, A_jk is evaluated first followed by Newton updates of α_k and β_k. EM iteration terminates when the percetnage of increase of log likelihood drop below NBH_TOL, which is deterministic since EM is guaranteed to converge. For more details, please see the manuscript of RIPSeeker.
Value
A list containing:
alpha
optimized alpha_k for NB at state K
beta
optimized beta_k for NB at state K
TRANS
optimized transition probability matrix
logl
Log likelihood in each EM iteration.
postprob
Posterior probabilities for each observed data point at the last EM iteration.
dens
the negative binomial probabilities computed at the last EM iteration
Author(s)
Yue Li
References
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition (Vol. 77, pp. 257-286). Presented at the Proceedings of the IEEE. doi:10.1109/5.18626
Christopher Bishop. Pattern recognition and machine learning. Number 605-631 in Information Science and Statisitcs. Springer Science, 2006.
X. L. Meng, D. B. Rubin, Maximum likelihood estimation via the
ECM algorithm: A general framework, Biometrika, 80(2):267-278 (1993).
J. A. Fessler, A. O. Hero, Space-alternating generalized expectation-maximization algorithm, IEEE Tr. on Signal Processing, 42(10):2664 -2677 (1994).