Last data update: 2014.03.03

R: Maximum Likelihood Estimation of a Spatial Probit Model
spprobitml R Documentation

Maximum Likelihood Estimation of a Spatial Probit Model

Description

Probit estimation for a model with an underlying latent variable of the form Y^* = ρ WY^* + X β +u

Usage

  spprobitml(form,wmat=NULL,shpfile=NULL,blockid=NULL,
  minblock=NULL,maxblock=NULL,stdprobit=TRUE,data=NULL)

Arguments

form

Model formula

wmat

Directly enter wmat rather than creating it from a shape file. Default: not specified. One of the wmat or shpfile options must be specified.

shpfile

Shape file to be used for creating the W matrix. Default: not specified. One of the wmat or shpfile options must be specified. The order of the observations in wmat must be the same as the order of observations in data.

blockid

A variable identifying groups used to specify a block diagonal structure for the W matrix, e.g., blockid=state or blockid=region. Calculates a separate W matrix for each block. The shpfile option must be specified; wmat is ignored.

minblock

Groups with fewer than minblock observations are omitted. Default is the number of explanatory variables, including the spatial lag term.

maxblock

Groups with more than maxblock observations are omitted. Unlimited by default. This option may be useful for very large data sets as full nblock x nblock matrices must be constructed for each block, where nblock is the number of observations in the block.

stdprobit

If TRUE, also prints standard probit model results. Default: stdprobit=TRUE.

data

A data frame containing the data. Default: use data in the current working directory

Details

Estimation is based on the reduced form of the spatial AR model, Y^* = (I - ρ W)^{-1}(X β + u). The model structure typically implies heteroskedasticity: the variance of the reduced form error term, (I - ρ W)^{-1}u, is σ^2 diag { (I - ρ W)^{-1}(I - ρ W')^{-1} } . For probit estimation, σ^2 is normalized to one. Let s_i^2 denote the variance for observation i, and define X^* = (I - ρ W)^{-1}X. Then the probability that Y_i^* > 0 is Φ (X_i^* β / s_i) , and the log-likelihood function is ∑_i { y_i ln (Φ_i ) + (1-y_i ) ln(1-Φ_i) } . The spprobitml commands estimates the model by maximizing this log-likelihood function with respect to β and ρ.

Variants of this approach – maximizing the log-likelihood function implied by the reduced form of the model – were proposed by Case (1992) and McMillen (1992). Case's estimation procedure relies on a simple form of the spatial weight matrix in which each observation within a district is affected equally by the other observations in the district. McMillen's (1992) approach is equivalent to the one used here, but he suggested using an EM algorithm to estimate the model. Neither author suggested a covariance matrix: Case (1992) appears to have relied on the standard probit estimate which applies when the model is estimated conditional on ρ, while McMillen (1992) proposed a bootstrap approach.

A consistent covariance matrix can be calculated using the gradient terms:

V(hat{θ})^{-1} = ≤ft( ∑_i partial lnL_i / partial hat{θ} ight)≤ft( ∑_i partial lnL_i / partial hat{θ}' ight)

The gradient term for hat{ρ} is calculated using numeric derivatives. The covariance matrix, V(hat{θ}), is not fully efficient because the estimation procedure only indirectly takes into account the autocorrelation structure. An analogous approach is used to calculate standard errors conditional on hat{ρ}. In the conditional case, only the gradient terms for hat{β} are used; they are evaluated using the estimated values of ρ.

Estimation can be very slow because each iteration requires the inversion of an nxn matrix. To speed up the estimation process and to reduce memory requirements, it may be desirable to impose a block diagonal structure on W. For example, it may be reasonable to impose that each state or region has its own error structure, with no correlation of errors across regions. The blockid option specifies a block diagonal structure such as blockid=region. If there are G groups, estimation requires G sub-matrices to be inverted rather than one nxn matrix, which greatly reduces memory requirements and significantly reduces the time required in estimation. The weight matrix must be calculated from shpfile if the blockid option is specified; the wmat option should be set to NULL.

Value

coef

Coefficient estimates.

logl

The log-likelihood value.

vmat1

The covariance matrix for hat{β}, conditional on hat{ρ}.

vmat2

The unconditional covariance matrix for hat{θ} = (hat{β}, hat{ρ}).

References

Case, Anne C., "Neighborhood Influence and Technological Change," Regional Science and Urban Economics 22 (1992), 491-508.

McMillen, Daniel P., "Probit With Spatial Autocorrelation," Journal of Regional Science 32 (1992), 335-348.

See Also

cparlogit

cparprobit

cparmlogit

gmmlogit

gmmprobit

makew

splogit

spprobit

Examples

set.seed(9947)
cmap <- readShapePoly(system.file("maps/CookCensusTracts.shp",
  package="McSpatial"))
cmap <- cmap[cmap$CHICAGO==1&cmap$CAREA!="O'Hare",]
lmat <- coordinates(cmap)
dnorth <- geodistance(lmat[,1],lmat[,2], lotarget=-87.627800, 
	latarget=41.881998,dcoor=TRUE)$dnorth
cmap <- cmap[dnorth>1,]
wmat <- makew(cmap)$wmat
n = nrow(wmat)
rho = .4
x <- runif(n,0,10)
ystar <- as.numeric(solve(diag(n) - rho*wmat)%*%(x + rnorm(n,0,2)))
y <- ystar>quantile(ystar,.4)
fit <- spprobitml(y~x,  wmat=wmat)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(McSpatial)
Loading required package: lattice
Loading required package: locfit
locfit 1.5-9.1 	 2013-03-22
Loading required package: maptools
Loading required package: sp
Checking rgeos availability: TRUE
Loading required package: quantreg
Loading required package: SparseM

Attaching package: 'SparseM'

The following object is masked from 'package:base':

    backsolve

Loading required package: RANN
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/McSpatial/spprobitml.Rd_%03d_medium.png", width=480, height=480)
> ### Name: spprobitml 
> ### Title: Maximum Likelihood Estimation of a Spatial Probit Model
> ### Aliases: 'spprobitml '
> ### Keywords: Spatial AR Model Parametric Models Discrete Choice Models
> 
> ### ** Examples
> 
> set.seed(9947)
> cmap <- readShapePoly(system.file("maps/CookCensusTracts.shp",
+   package="McSpatial"))
> cmap <- cmap[cmap$CHICAGO==1&cmap$CAREA!="O'Hare",]
> lmat <- coordinates(cmap)
> dnorth <- geodistance(lmat[,1],lmat[,2], lotarget=-87.627800, 
+ 	latarget=41.881998,dcoor=TRUE)$dnorth
> cmap <- cmap[dnorth>1,]
> wmat <- makew(cmap)$wmat
Loading required package: Matrix
> n = nrow(wmat)
> rho = .4
> x <- runif(n,0,10)
> ystar <- as.numeric(solve(diag(n) - rho*wmat)%*%(x + rnorm(n,0,2)))
> y <- ystar>quantile(ystar,.4)
> fit <- spprobitml(y~x,  wmat=wmat)
Standard Probit Estimates 

Call:
glm(formula = form, family = binomial(link = "probit"), data = data)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.8317  -0.4635   0.1229   0.5234   2.5058  

Coefficients:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -1.96424    0.21187  -9.271   <2e-16 ***
x            0.48279    0.04439  10.876   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 422.97  on 313  degrees of freedom
Residual deviance: 216.88  on 312  degrees of freedom
AIC: 220.88

Number of Fisher Scoring iterations: 6

Conditional on rho 
rho =  0.4073408 
    Estimate Std. Error   z-value Pr(>|z|)
y -2.3066700 0.22977541 -10.03880        0
x  0.5175577 0.04856963  10.65599        0
Unconditional Standard Errors 
      Estimate Std. Error   z-value     Pr(>|z|)
y   -2.3066700 0.26879217 -8.581611 0.0000000000
x    0.5175577 0.05232258  9.891669 0.0000000000
rho  0.4073408 0.12100407  3.366340 0.0007617289
Number of observations =  314 
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>