Evaluates an approximation to minus the log likelihood for an
aster model with random effects. Uses Laplace approximation to
integrate out the random effects analytically.
The “quasi” in the title is a misnomer in the context of aster
models but the acronym PQL for this procedure is well-established in
the generalized linear mixed models literature.
vector of square roots of variance components,
one component for each group of random effects. Negative values are
allowed; the vector of variance components is sigma^2.
parm
starting value for inner optimization. Ignored if
cache$parm exists, in which case the latter is used.
For pickle and pickle1, length
is number of effects (fixed and random).
For pickle2, length
is number of random effects. For all, random effects are rescaled,
divided by the corresponding component of sigma if that is
nonzero and equal to zero otherwise.
alphasigma
the concatenation of the vector of fixed effects
and the vector of square roots of variance components.
alphaceesigma
the concatenation of the vector of fixed effects,
the vector of rescaled random effects,
and the vector of square roots of variance components.
fixed
the model matrix for fixed effects. The number of rows
is nrow(obj$data). The number of columns is the number of fixed
effects.
random
the model matrix or matrices for random effects.
The number of rows is nrow(obj$data). The number of columns
is the number of random effects in a group. Either a matrix or a list
each element of which is a matrix.
obj
aster model object, the result of a call to aster.
y
response vector. May be omitted, in which case obj$x
is used. If supplied, must be a matrix of the same dimensions as
obj$x.
origin
origin of aster model. May be omitted, in which case
default origin (see aster) is used. If supplied, must be
a matrix of the same dimensions obj$x.
cache
If not missing, an environment in which to cache the value
of parm found during previous evaluations. If supplied parm
is taken from cache.
zwz
A possible value of t(Z) W Z, where Z is the
model matrix for all random effects and W is the variance matrix of
the response.
deriv
Number of derivatives wanted. For pickle1
or pickle2, either zero or one. For pickle3,
zero, one or two.
...
additional arguments passed to trust, which
is used to maximize the penalized log likelihood.
Details
Define
p(alpha, c, sigma) = m(a + M alpha + Z A c) + t(c) c / 2 + log det[A t(Z) What Z A + I] / 2
where m is minus the log likelihood function of a saturated aster model,
a is a known vector (the offset vector in the terminology
of glm but the origin in the terminology
of aster), M is a known matrix, the model matrix for
fixed effects (the argument fixed of these functions),
Z is a known matrix, the model matrix for random effects
(either the argument random of these functions if it is a matrix or
Reduce(cbind, random) if random is a list of matrices),
A is a diagonal matrix whose diagonal is the vector
rep(sigma, times = nrand)
where nrand is sapply(random, ncol)
when random is a list of
matrices and ncol(random) when random is a matrix,
What is any symmetric positive semidefinite matrix
(more on this below),
and I is the identity matrix.
Note that A is a function of sigma although the
notation does not explicitly indicate this.
Let cstar denote the minimizer of
p(α, c, σ) considered as a function of c
for fixed alpha and sigma,
and let alphatwiddle and ctwiddle
denote the (joint) minimizers of p(α, c, σ) considered as
a function of alpha and c for fixed sigma.
Note that cstar is a function of alpha
and sigma although the notation does not explicitly
indicate this.
Note that alphatwiddle and ctwiddle
are functions of sigma (only) although the notation
does not explicitly indicate this.
Now define
q(alpha, sigma) = p(alpha, cstar, sigma)
and
r(sigma) = p(alphatwiddle, cstar, sigma)
Then pickle1 evaluates r(sigma),
pickle2 evaluates q(alpha, sigma), and
pickle3 evaluates p(alpha, c, sigma),
where t(Z) What Z in the formulas above is
specified by the argument zwz of these functions.
All of these functions supply derivative (gradient) vectors if
deriv = 1 is specified, and pickle3 supplies the
second derivative (Hessian) matrix if deriv = 2 is specified.
Let W denote the second derivative function of m, that is,
W(phi) is the second derivative matrix of the function
m evaluated at the point phi. The idea is that
What should be approximately the value of
W(a + M alphahat + Z Ahat chat),
where alphahat, chat,
and sigmahat are the (joint) minimizers of p(alpha, c, sigma)
and Ahat = A(sigmahat).
In aid of this, the function makezwz
evaluates
t(Z) W(a + M alpha + Z A c) Z
for any alpha, c, and sigma.
pickle evaluates the function
s(sigma) = m(a + M alphatwiddle + Z A ctwiddle) + t(ctwiddle) ctwiddle / 2 + log det[A t(Z) W(a + M alphatwiddle + Z A ctwiddle) Z A + I]
no derivatives can be computed because no derivatives of the function
W are computed for aster models.
The general idea is the one uses pickle with a no-derivative
optimizer, such as the "Nelder-Mead" method of the optim
function to get a crude estimate of sigmahat.
Then one uses trust with objective
function penmlogl to estimate the corresponding
alphahat and chat (example below).
Then one use makezwz to produce the corresponding zwz
(example below).
These estimates can be improved using trust with objective
function pickle3 using this zwz (example below), and this
step may be iterated until convergence.
Finally, optim is used with objective function pickle2
to estimate the Hessian matrix of q(alpha, sigma),
which is approximate observed information because
q(alpha, sigma) is approximate minus log likelihood.
Value
For pickle, a scalar, minus the
(PQL approximation of) the log likelihood.
For pickle1 and pickle2, a list having components value
and gradient (present only when deriv = 1).
For pickle3, a list having components value,
gradient (present only when deriv >= 1),
and hessian (present only when deriv = 2).
Note
Not intended for use by naive users. Use reaster,
which calls them.