R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Modified profile likelihood computation

logMPL

R Documentation

Modified profile likelihood computation

Description

This function evaluates the Modified Profile Likelihood (MPL) for a subset of the model parameter. The result is optionally returned with a minus sign, so the function can be used directly as input to a general-purpose optimizer.

Usage

logMPL(psival, data, mle, floglik, fscore=NULL, indpsi, datagen, R=500, seed=NULL, 
       minus=FALSE, trace=FALSE)

Arguments

`psival`	A numerical vector containing the value of the parameter of interest.
`data`	The data as a list. All the elements required to compute the likelihood function at a given parameter value should be included in this list. The required format of such list will be determined by the user-provided function `floglik`.
`mle`	A numerical vector, containing the maximum likelihood estimate of the entire model parameter.
`floglik`	A function which returns the log likelihood function at a given parameter value. In particular, for a certain parameter value contained in a numerical vector `theta`, a call `floglik(theta, data)` should return a scalar numerical value, the log likelihood function at `theta`. Note that the parameter of interest should be a subset of the coordinates of `theta`.
`fscore`	An optional function which returns the score function at a given parameter value. It must return a numerical vector of the same length of `mle`. For a certain parameter value contained in a numerical vector `theta`, a call `fscore(theta, data)` should return the gradient of the log likelihood function at `theta`. Default is `NULL`, implying that numerical differentiation will be employed.
`indpsi`	A vector of integers in the range `1:length(theta)` containing the indexes of the parameter of interest, so that the parameter of interest will be given by `theta[indpsi]`.
`datagen`	A function which simulates a data set. A call `datagen(theta, data)` will generate a copy of the `data` list, with the values of the response variable replaced by a set of values simulated from the parametric statistical model assumed for the response variable.
`R`	The number of Monte Carlo replicates used for computing the modified profile likelihood. A positive integer, default is `500`.
`seed`	Optional positive integer, the random seed for the Monte Carlo computation. Default is `NULL`.
`minus`	Logical. Should the modified profile likelihood be multiplied by -1? This may be useful for usage with optimizers. Default is `FALSE`.
`trace`	Logical. When set to `TRUE` will cause the printing of the MPL value, which can be useful to monitor optimization. Default is `FALSE`.

Details

The function implements the Modified Profile Likelihood employing the approximation to sample space derivatives proposed in Skovgaard (1996). The function is designed to be used with external functions, such as optimizers and evaluators over a grid of points.

Value

A scalar value, minus the modified profile likelihood at psival.

References

Severini, T.A. (2000). Likelihood Methods in Statistics. Oxford University Press.

Skovgaard, I.M. (1996) An explicit large-deviation approximation to one-parameter tests. Bernoulli, 2, 145–165.

Examples

# Approximating the conditional likelihood for logistic regression
# Let us define the various functions	
# Log likelihood for logistic regression
loglik.logit<- function(theta, data) 
{
  y <- data$y
  den <- data$den
  X <- data$X
  eta <- X %*% theta
  p <- plogis(eta)
  l <- sum(y * log(p) + (den - y) * log(1-p))
  return(l)
}
# Score function
grad.logit<- function(theta, data) 
{
  y <- data$y
  den <- data$den
  X <- data$X
  eta <- X %*% theta
  p <- plogis(eta)
  out <- t(y - p * den) %*% X
  return(drop(out))
}
# Data generator
gendat.logit<- function(theta, data)
{
  X <- data$X
  eta <- X %*% theta
  p <- plogis(eta)
  out <- data
  out$y <- rbinom(length(data$y), size = data$den, prob = p)
  return(out) 
}		
# Famous crying babies data
data(babies)	
mod.glm <- glm(formula = cbind(r1, r2) ~ day + lull - 1, family = binomial, 
               data = babies)
data.obj <- list(y = babies$r1, den = babies$r1 + babies$r2, 
                 X = model.matrix(mod.glm))	
# Numerical optimization of profile and modified profile log likelihoods
max.prof <- nlminb(0, logPL, data=data.obj, thetainit=coef(mod.glm), 
                  floglik=loglik.logit, fscore=grad.logit, indpsi=19, minus=TRUE, trace=FALSE)
max.mpl <- nlminb(0, logMPL, data=data.obj, mle=coef(mod.glm), 
                  floglik=loglik.logit, fscore=grad.logit, datagen=gendat.logit,
                  indpsi=19, R=50, seed=2020, minus=TRUE, trace=FALSE)
c(max.prof$par, max.mpl$par)                 
# We can plot the profile likelihood and the modified profile likelihood
# R=50 suffices for the modified profile likelihood as the model is a full exp. family
psi.vals <- seq(-0.3, 3.7, l=20)
obj.prof <- sapply(psi.vals, logPL, data=data.obj, thetainit=coef(mod.glm), 
                floglik=loglik.logit, fscore=grad.logit, indpsi=19, trace=FALSE)
obj.mpl <- sapply(psi.vals, logMPL, data=data.obj, mle=coef(mod.glm), 
                floglik=loglik.logit, fscore=grad.logit, datagen=gendat.logit,
                indpsi=19, trace=FALSE, R=50, seed=2020)
par(pch="s")
plot(psi.vals, obj.prof - max(obj.prof), type="l", xlab=expression(psi), 
     ylab="log likelihood", lwd=2, las=1)
lines(psi.vals, obj.mpl - max(obj.mpl), col="red", lwd=2)
legend("topright", col=c(1, 2), lty=1, lwd=2, legend=c("Profile","MPL"), bty="n")