R: Fit a centered autologistic model using maximum...
autologistic
R Documentation
Fit a centered autologistic model using maximum pseudolikelihood estimation or MCMC for Bayesian inference.
Description
Fit a centered autologistic model using maximum pseudolikelihood estimation or MCMC for Bayesian inference.
Usage
autologistic(formula, data, A, method = c("PL", "Bayes"), model = TRUE,
x = FALSE, y = FALSE, verbose = FALSE, control = list())
Arguments
formula
an object of class formula: a symbolic description of the model to be fitted.
data
an optional data frame, list, or environment (or object coercible by as.data.frame to a data frame) containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which autologistic is called.
A
the adjacency matrix for the underlying graph.
method
the method to use for inference. “PL” (the default) enables maximum pseudolikelihood estimation, and “Bayes” enables Bayesian inference.
control
a list of the following control parameters.
confint
(PL) the method for computing confidence intervals. The possible values are “sandwich” (the default), “bootstrap”, and “none”.
sigma
(Bayes) the common standard deviation for β's prior distribution. Defaults to 1,000.
eta.max
(Bayes) the right endpoint for η's prior distribution. Defaults to 2.
bootit
(PL) the size of the bootstrap sample. This applies when confint is “sandwich” or “bootstrap”, since samples from the fitted model are needed in both cases. Defaults to 1,000.
trainit
(Bayes) the length of the training run. Defaults to 100,000.
minit
(Bayes) the minimum number of posterior samples. Defaults to 100,000.
maxit
(Bayes) the maximum number of posterior samples. Defaults to 1,000,000.
tol
(Bayes) the tolerance. Defaults to 0.01.
parallel
(PL) a logical value indicating whether to parallelize the bootstrap. This defaults to TRUE if the snow package is installed.
type
(PL) the cluster type, one of “SOCK” (default), “PVM”, “MPI”, or “NWS”.
nodes
(PL) the number of slave nodes to create.
model
a logical value indicating whether the model frame should be included as a component of the returned value.
x
a logical value indicating whether the model matrix used in the fitting process should be returned as a component of the returned value.
y
a logical value indicating whether the response vector used in the fitting process should be returned as a component of the returned value.
verbose
a logical value indicating whether to print various messages to the screen, including progress updates when method is “Bayes”. Defaults to FALSE.
Details
This function fits the centered autologistic model of Caragea and Kaiser (2009) using maximum pseudolikelihood estimation or MCMC for Bayesian inference.
The joint distribution for the centered autologistic model is
π(Z | θ)=c(θ)^{-1} exp(Z'Xβ - η Z'Aμ + 0.5 η Z'AZ),
where θ = (β', η)' is the parameter vector, c(θ) is an intractable normalizing function, Z is the response vector, X is the design matrix,
β is a (p-1)-vector of regression coefficients, A is the adjacency matrix for the underlying graph, μ is the vector of independence expectations,
and η is the spatial dependence parameter.
Maximum pseudolikelihood estimation sidesteps the intractability of c(θ) by maximizing the product of the conditional likelihoods.
Confidence intervals are then obtained using a parametric bootstrap or a normal approximation, i.e., sandwich estimation. The bootstrap datasets are generated by perfect sampling (rautologistic).
The bootstrap samples can be generated in parallel using the snow package.
Bayesian inference is obtained using the auxiliary variable algorithm of Moller et al. (2006).
The auxiliary variables are generated by perfect sampling.
The prior distributions are (1) zero-mean normal with independent coordinates for β, and (2) uniform for η.
The common standard deviation for the normal prior can be supplied by the user as control parameter sigma. The default is 1,000. The uniform prior has support [0, 2] by default, but the right endpoint can be supplied (as control parameter eta.max) by the user.
The posterior covariance matrix of θ is estimated using samples obtained during a training run. The default number of iterations for the training run is 100,000, but this can be controlled by the user (via parameter trainit). The estimated covariance matrix is then used as the proposal variance for a Metropolis-Hastings random walk. The proposal distribution is normal. The posterior samples obtained during the second run are used for inference. The length of the run can be controlled by the user via parameters minit, maxit, and tol. The first determines the minimum number of iterations. If minit has been reached, the sampler will terminate when maxit is reached or all Monte Carlo standard errors are smaller than tol, whichever happens first. The default values for minit, maxit, and tol are 100,000; 1,000,000; and 0.01, respectively.
Value
autologistic returns an object of class “autologistic”, which is a list containing the following components.
coefficients
the point estimate of θ.
fitted.values
the fitted mean values, obtained by transforming the linear predictors by the inverse of the link function.
linear.predictors
the linear fit on link scale.
residuals
the response residuals.
iter
the size of the bootstrap/posterior sample.
sample
(where relevant) an iter by p matrix containing the bootstrap/posterior samples.
mcse
(where relevant) a p-vector of Monte Carlo standard errors.
S
(where relevant) the estimated sandwich matrix.
accept
(Bayes) the acceptance rate for the MCMC sampler.
y
if requested (the default), the y vector used.
X
if requested, the model matrix.
model
if requested (the default), the model frame.
call
the matched call.
formula
the formula supplied.
method
the method used for inference.
convergence
the integer code returned by optim subsequent to optimizing the pseudolikelihood.
message
a character string to go along with convergence.
terms
the terms object used.
data
the data argument.
xlevels
(where relevant) a record of the levels of the factors used in fitting.
control
a list containing the names and values of the control parameters.
value
the value of the negative log pseudolikelihood at its minimum.
References
Caragea, P. and Kaiser, M. (2009) Autologistic models with interpretable parameters. Journal of Agricultural, Biological, and Environmental Statistics, 14(3), 281–300.
Hughes, J., Haran, M. and Caragea, P. C. (2011) Autologistic models for binary data on a lattice. Environmetrics, 22(7), 857–871.
Moller, J., Pettitt, A., Berthelsen, K., and Reeves, R. (2006) An efficient Markov chain Monte Carlo method for distributions with intractable normalising constants. Biometrika, 93(2), 451–458.
## Not run:
# Use the 20 x 20 square lattice as the underlying graph.
n = 20
A = adjacency.matrix(n)
# Assign coordinates to each vertex such that the coordinates are restricted to the unit square
# centered at the origin.
x = rep(0:(n - 1) / (n - 1), times = n) - 0.5
y = rep(0:(n - 1) / (n - 1), each = n) - 0.5
X = cbind(x, y) # Use the vertex locations as spatial covariates.
beta = c(2, 2) # These are the regression coefficients.
col1 = "white"
col2 = "black"
colfunc = colorRampPalette(c(col1, col2))
# Simulate a dataset with the above mentioned regression component and eta equal to 0.6. This
# value of eta corresponds to dependence that is moderate in strength.
theta = c(beta, eta = 0.6)
set.seed(123456)
Z = rautologistic(X, A, theta)
# Find the MPLE, and do not compute confidence intervals.
fit = autologistic(Z ~ X - 1, A = A, control = list(confint = "none"))
summary(fit)
# Compute confidence intervals based on the normal approximation. Estimate the "filling" in the
# sandwich matrix using a parallel parametric bootstrap, where the computation is distributed
# across six cores on the host workstation.
set.seed(123456)
fit = autologistic(Z ~ X - 1, A = A, verbose = TRUE,
control = list(confint = "sandwich", nodes = 6))
summary(fit)
# Compute confidence intervals based on a parallel parametric bootstrap. Use a bootstrap sample
# of size 500, and distribute the computation across six cores on the host workstation.
set.seed(123456)
fit = autologistic(Z ~ X - 1, A = A, verbose = TRUE,
control = list(confint = "bootstrap", bootit = 500, nodes = 6))
summary(fit)
# Make some level plots of the residuals.
dev.new()
lattice::levelplot(residuals(fit) ~ x * y, aspect = "iso", col.regions = colfunc(n^2))
dev.new()
lattice::levelplot(residuals(fit, type = "pearson") ~ x * y, aspect = "iso",
col.regions = colfunc(n^2))
dev.new()
lattice::levelplot(residuals(fit, type = "response") ~ x * y, aspect = "iso",
col.regions = colfunc(n^2))
# Do MCMC for Bayesian inference. The length of the training run will be 10,000, and
# the length of the subsequent inferential run will be at least 10,000.
set.seed(123456)
fit = autologistic(Z ~ X - 1, A = A, verbose = TRUE, method = "Bayes",
control = list(trainit = 10000, minit = 10000, sigma = 1000))
summary(fit)
## End(Not run)