Last data update: 2014.03.03

R: Fit a centered autologistic model using maximum...
autologisticR Documentation

Fit a centered autologistic model using maximum pseudolikelihood estimation or MCMC for Bayesian inference.

Description

Fit a centered autologistic model using maximum pseudolikelihood estimation or MCMC for Bayesian inference.

Usage

autologistic(formula, data, A, method = c("PL", "Bayes"), model = TRUE,
  x = FALSE, y = FALSE, verbose = FALSE, control = list())

Arguments

formula

an object of class formula: a symbolic description of the model to be fitted.

data

an optional data frame, list, or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which autologistic is called.

A

the adjacency matrix for the underlying graph.

method

the method to use for inference. “PL” (the default) enables maximum pseudolikelihood estimation, and “Bayes” enables Bayesian inference.

control

a list of the following control parameters.

confint

(PL) the method for computing confidence intervals. The possible values are “sandwich” (the default), “bootstrap”, and “none”.

sigma

(Bayes) the common standard deviation for β's prior distribution. Defaults to 1,000.

eta.max

(Bayes) the right endpoint for η's prior distribution. Defaults to 2.

bootit

(PL) the size of the bootstrap sample. This applies when confint is “sandwich” or “bootstrap”, since samples from the fitted model are needed in both cases. Defaults to 1,000.

trainit

(Bayes) the length of the training run. Defaults to 100,000.

minit

(Bayes) the minimum number of posterior samples. Defaults to 100,000.

maxit

(Bayes) the maximum number of posterior samples. Defaults to 1,000,000.

tol

(Bayes) the tolerance. Defaults to 0.01.

parallel

(PL) a logical value indicating whether to parallelize the bootstrap. This defaults to TRUE if the snow package is installed.

type

(PL) the cluster type, one of “SOCK” (default), “PVM”, “MPI”, or “NWS”.

nodes

(PL) the number of slave nodes to create.

model

a logical value indicating whether the model frame should be included as a component of the returned value.

x

a logical value indicating whether the model matrix used in the fitting process should be returned as a component of the returned value.

y

a logical value indicating whether the response vector used in the fitting process should be returned as a component of the returned value.

verbose

a logical value indicating whether to print various messages to the screen, including progress updates when method is “Bayes”. Defaults to FALSE.

Details

This function fits the centered autologistic model of Caragea and Kaiser (2009) using maximum pseudolikelihood estimation or MCMC for Bayesian inference. The joint distribution for the centered autologistic model is

π(Z | θ)=c(θ)^{-1} exp(Z'Xβ - η Z'Aμ + 0.5 η Z'AZ),

where θ = (β', η)' is the parameter vector, c(θ) is an intractable normalizing function, Z is the response vector, X is the design matrix, β is a (p-1)-vector of regression coefficients, A is the adjacency matrix for the underlying graph, μ is the vector of independence expectations, and η is the spatial dependence parameter.

Maximum pseudolikelihood estimation sidesteps the intractability of c(θ) by maximizing the product of the conditional likelihoods. Confidence intervals are then obtained using a parametric bootstrap or a normal approximation, i.e., sandwich estimation. The bootstrap datasets are generated by perfect sampling (rautologistic). The bootstrap samples can be generated in parallel using the snow package.

Bayesian inference is obtained using the auxiliary variable algorithm of Moller et al. (2006). The auxiliary variables are generated by perfect sampling.

The prior distributions are (1) zero-mean normal with independent coordinates for β, and (2) uniform for η. The common standard deviation for the normal prior can be supplied by the user as control parameter sigma. The default is 1,000. The uniform prior has support [0, 2] by default, but the right endpoint can be supplied (as control parameter eta.max) by the user.

The posterior covariance matrix of θ is estimated using samples obtained during a training run. The default number of iterations for the training run is 100,000, but this can be controlled by the user (via parameter trainit). The estimated covariance matrix is then used as the proposal variance for a Metropolis-Hastings random walk. The proposal distribution is normal. The posterior samples obtained during the second run are used for inference. The length of the run can be controlled by the user via parameters minit, maxit, and tol. The first determines the minimum number of iterations. If minit has been reached, the sampler will terminate when maxit is reached or all Monte Carlo standard errors are smaller than tol, whichever happens first. The default values for minit, maxit, and tol are 100,000; 1,000,000; and 0.01, respectively.

Value

autologistic returns an object of class “autologistic”, which is a list containing the following components.

coefficients

the point estimate of θ.

fitted.values

the fitted mean values, obtained by transforming the linear predictors by the inverse of the link function.

linear.predictors

the linear fit on link scale.

residuals

the response residuals.

iter

the size of the bootstrap/posterior sample.

sample

(where relevant) an iter by p matrix containing the bootstrap/posterior samples.

mcse

(where relevant) a p-vector of Monte Carlo standard errors.

S

(where relevant) the estimated sandwich matrix.

accept

(Bayes) the acceptance rate for the MCMC sampler.

y

if requested (the default), the y vector used.

X

if requested, the model matrix.

model

if requested (the default), the model frame.

call

the matched call.

formula

the formula supplied.

method

the method used for inference.

convergence

the integer code returned by optim subsequent to optimizing the pseudolikelihood.

message

a character string to go along with convergence.

terms

the terms object used.

data

the data argument.

xlevels

(where relevant) a record of the levels of the factors used in fitting.

control

a list containing the names and values of the control parameters.

value

the value of the negative log pseudolikelihood at its minimum.

References

Caragea, P. and Kaiser, M. (2009) Autologistic models with interpretable parameters. Journal of Agricultural, Biological, and Environmental Statistics, 14(3), 281–300.

Hughes, J., Haran, M. and Caragea, P. C. (2011) Autologistic models for binary data on a lattice. Environmetrics, 22(7), 857–871.

Moller, J., Pettitt, A., Berthelsen, K., and Reeves, R. (2006) An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants. Biometrika, 93(2), 451–458.

See Also

rautologistic, residuals.autologistic, summary.autologistic, vcov.autologistic

Examples

## Not run: 

# Use the 20 x 20 square lattice as the underlying graph.

n = 20
A = adjacency.matrix(n)

# Assign coordinates to each vertex such that the coordinates are restricted to the unit square
# centered at the origin.

x = rep(0:(n - 1) / (n - 1), times = n) - 0.5
y = rep(0:(n - 1) / (n - 1), each = n) - 0.5
X = cbind(x, y)                                 # Use the vertex locations as spatial covariates.
beta = c(2, 2)                                  # These are the regression coefficients.
col1 = "white"
col2 = "black"
colfunc = colorRampPalette(c(col1, col2))

# Simulate a dataset with the above mentioned regression component and eta equal to 0.6. This
# value of eta corresponds to dependence that is moderate in strength.

theta = c(beta, eta = 0.6)
set.seed(123456)
Z = rautologistic(X, A, theta)

# Find the MPLE, and do not compute confidence intervals.

fit = autologistic(Z ~ X - 1, A = A, control = list(confint = "none"))
summary(fit)

# Compute confidence intervals based on the normal approximation. Estimate the "filling" in the
# sandwich matrix using a parallel parametric bootstrap, where the computation is distributed
# across six cores on the host workstation.

set.seed(123456)
fit = autologistic(Z ~ X - 1, A = A, verbose = TRUE,
                   control = list(confint = "sandwich", nodes = 6))
summary(fit)

# Compute confidence intervals based on a parallel parametric bootstrap. Use a bootstrap sample
# of size 500, and distribute the computation across six cores on the host workstation.

set.seed(123456)
fit = autologistic(Z ~ X - 1, A = A, verbose = TRUE,
                   control = list(confint = "bootstrap", bootit = 500, nodes = 6))
summary(fit)

# Make some level plots of the residuals.

dev.new()
lattice::levelplot(residuals(fit) ~ x * y, aspect = "iso", col.regions = colfunc(n^2))
dev.new()
lattice::levelplot(residuals(fit, type = "pearson") ~ x * y, aspect = "iso",
                   col.regions = colfunc(n^2))
dev.new()
lattice::levelplot(residuals(fit, type = "response") ~ x * y, aspect = "iso",
                   col.regions = colfunc(n^2))

# Do MCMC for Bayesian inference. The length of the training run will be 10,000, and
# the length of the subsequent inferential run will be at least 10,000.

set.seed(123456)
fit = autologistic(Z ~ X - 1, A = A, verbose = TRUE, method = "Bayes",
                   control = list(trainit = 10000, minit = 10000, sigma = 1000))
summary(fit)

## End(Not run)

Results