Last data update: 2014.03.03

R: Modified profile likelihood computation
logMPLR Documentation

Modified profile likelihood computation

Description

This function evaluates the Modified Profile Likelihood (MPL) for a subset of the model parameter. The result is optionally returned with a minus sign, so the function can be used directly as input to a general-purpose optimizer.

Usage

logMPL(psival, data, mle, floglik, fscore=NULL, indpsi, datagen, R=500, seed=NULL, 
       minus=FALSE, trace=FALSE)

Arguments

psival

A numerical vector containing the value of the parameter of interest.

data

The data as a list. All the elements required to compute the likelihood function at a given parameter value should be included in this list. The required format of such list will be determined by the user-provided function floglik.

mle

A numerical vector, containing the maximum likelihood estimate of the entire model parameter.

floglik

A function which returns the log likelihood function at a given parameter value. In particular, for a certain parameter value contained in a numerical vector theta, a call floglik(theta, data) should return a scalar numerical value, the log likelihood function at theta. Note that the parameter of interest should be a subset of the coordinates of theta.

fscore

An optional function which returns the score function at a given parameter value. It must return a numerical vector of the same length of mle. For a certain parameter value contained in a numerical vector theta, a call fscore(theta, data) should return the gradient of the log likelihood function at theta. Default is NULL, implying that numerical differentiation will be employed.

indpsi

A vector of integers in the range 1:length(theta) containing the indexes of the parameter of interest, so that the parameter of interest will be given by theta[indpsi].

datagen

A function which simulates a data set. A call datagen(theta, data) will generate a copy of the data list, with the values of the response variable replaced by a set of values simulated from the parametric statistical model assumed for the response variable.

R

The number of Monte Carlo replicates used for computing the modified profile likelihood. A positive integer, default is 500.

seed

Optional positive integer, the random seed for the Monte Carlo computation. Default is NULL.

minus

Logical. Should the modified profile likelihood be multiplied by -1? This may be useful for usage with optimizers. Default is FALSE.

trace

Logical. When set to TRUE will cause the printing of the MPL value, which can be useful to monitor optimization. Default is FALSE.

Details

The function implements the Modified Profile Likelihood employing the approximation to sample space derivatives proposed in Skovgaard (1996). The function is designed to be used with external functions, such as optimizers and evaluators over a grid of points.

Value

A scalar value, minus the modified profile likelihood at psival.

References

Severini, T.A. (2000). Likelihood Methods in Statistics. Oxford University Press.

Skovgaard, I.M. (1996) An explicit large-deviation approximation to one-parameter tests. Bernoulli, 2, 145–165.

Examples

# Approximating the conditional likelihood for logistic regression
# Let us define the various functions	
# Log likelihood for logistic regression
loglik.logit<- function(theta, data) 
{
  y <- data$y
  den <- data$den
  X <- data$X
  eta <- X %*% theta
  p <- plogis(eta)
  l <- sum(y * log(p) + (den - y) * log(1-p))
  return(l)
}
# Score function
grad.logit<- function(theta, data) 
{
  y <- data$y
  den <- data$den
  X <- data$X
  eta <- X %*% theta
  p <- plogis(eta)
  out <- t(y - p * den) %*% X
  return(drop(out))
}
# Data generator
gendat.logit<- function(theta, data)
{
  X <- data$X
  eta <- X %*% theta
  p <- plogis(eta)
  out <- data
  out$y <- rbinom(length(data$y), size = data$den, prob = p)
  return(out) 
}		
# Famous crying babies data
data(babies)	
mod.glm <- glm(formula = cbind(r1, r2) ~ day + lull - 1, family = binomial, 
               data = babies)
data.obj <- list(y = babies$r1, den = babies$r1 + babies$r2, 
                 X = model.matrix(mod.glm))	
# Numerical optimization of profile and modified profile log likelihoods
max.prof <- nlminb(0, logPL, data=data.obj, thetainit=coef(mod.glm), 
                  floglik=loglik.logit, fscore=grad.logit, indpsi=19, minus=TRUE, trace=FALSE)
max.mpl <- nlminb(0, logMPL, data=data.obj, mle=coef(mod.glm), 
                  floglik=loglik.logit, fscore=grad.logit, datagen=gendat.logit,
                  indpsi=19, R=50, seed=2020, minus=TRUE, trace=FALSE)
c(max.prof$par, max.mpl$par)                 
# We can plot the profile likelihood and the modified profile likelihood
# R=50 suffices for the modified profile likelihood as the model is a full exp. family
psi.vals <- seq(-0.3, 3.7, l=20)
obj.prof <- sapply(psi.vals, logPL, data=data.obj, thetainit=coef(mod.glm), 
                floglik=loglik.logit, fscore=grad.logit, indpsi=19, trace=FALSE)
obj.mpl <- sapply(psi.vals, logMPL, data=data.obj, mle=coef(mod.glm), 
                floglik=loglik.logit, fscore=grad.logit, datagen=gendat.logit,
                indpsi=19, trace=FALSE, R=50, seed=2020)
par(pch="s")
plot(psi.vals, obj.prof - max(obj.prof), type="l", xlab=expression(psi), 
     ylab="log likelihood", lwd=2, las=1)
lines(psi.vals, obj.mpl - max(obj.mpl), col="red", lwd=2)
legend("topright", col=c(1, 2), lty=1, lwd=2, legend=c("Profile","MPL"), bty="n")
  

Results