R: Bayesian variable selection and model averaging for linear...
pmomLM
R Documentation
Bayesian variable selection and model averaging for linear
and probit models via non-local priors.
Description
Variable selection for linear and probit models, providing a sample from the
joint posterior of the model and regression coefficients.
pmomLM and pmomPM implement product Normal MOM and
heavy-tailed product MOM as prior distribution for linear and probit
model coefficients (respectively).
emomLM and emomPM set an eMOM prior.
pplPM finds the value of the prior dispersion parameter tau
minimizing posterior expected predictive loss (Gelfand and Ghosh, 1998) for the Probit model,
i.e. can be used to automatically set up tau.
ppmodel returns the proportion of visits to each model.
Vector with observed responses. For pmomLM this must be
a numeric vector. For pmomPM it can either be a logical vector,
a factor with 2 levels or a numeric vector taking only two distinct values.
x
Design matrix with all potential predictors which are to
undergo variable selection.
xadj
Design matrix for adjustment covariates, i.e. variables
which are included in the model with probability 1. For instance,
xadj can be used to force the inclusion of an intercept in the model.
center
If center==TRUE, y and x are centered to have zero mean, therefore eliminating the need to include an intercept term in x
scale
If scale==TRUE, y and columns in x are scaled to have standard deviation 1
niter
Number of MCMC sampling iterations
thinning
MCMC thinning factor, i.e. only one out of each thinning iterations are reported. Defaults to thinning=1, i.e. no thinning
burnin
Number of burn-in MCMC iterations. Defaults to
.1*niter. Set to 0 for no burn-in
priorCoef
Prior distribution for the coefficients. Must be object of class msPriorSpec with slot priorType set to 'coefficients'. Possible values for slot priorDistr are 'pMOM', 'piMOM' and 'peMOM'
priorDelta
Prior on model indicator space. Must be object of
class msPriorSpec with slot priorType set to
'modelIndicator'. Possible values for slot priorDistr are
'uniform' and 'binomial'. For 'binomial', you can either set the prior
probability 'p' or specify a Beta-binomial prior by specifying the
parameters 'alpha.p','beta.p'.
priorVar
Prior on residual variance. Must be object of class msPriorSpec with slot priorType set to 'nuisancePars'. Slot priorDistr must be equal to 'invgamma'
initSearch
Algorithm to refine
deltaini. initSearch=='greedy' uses a greedy Gibbs
sampling search. initSearch=='SCAD' sets deltaini to the
non-zero elements in a SCAD fit with cross-validated regularization
parameter. initSearch=='none' initializes to the null model
with no variables in x included.
verbose
Set verbose==TRUE to print iteration progress
tauseq
Grid of tau values for which the posterior predictive loss
should be evaluated.
kPen
Penalty term specifying the relative importance of
deviations from the observed data vs deviation from the posterior
predictive. kPen can be set either to a
numeric value or to 'msize' to set penalty equal to the average
model size. Loss is Dev(yp,yhat) +
kPen*Dev(yhat,yobs), where yp: draw from post predictive, yobs:
observed data and yhat is E(yp|yobs).
mc.cores
Allows for parallel computing. mc.cores is the number of processors to use.
Setting mc.cores>1 requires the parallel package.
nlpfit
Non-local prior model fit, as returned by pmomLM,
pmomPM, emomLM or emomPM.
Details
The implemented MCMC scheme makes proposals from the joint posterior of
(delta[i],theta[i]) given all other parameters and the data, where
delta[i] is the indicator for inclusion/exclusion of covariate i and
theta[i] is the coefficient value.
In contrast with some model fitting options implemented in
modelSelection, here the scheme is exact. However, sampling the
coefficients can adversely affect the mixing when covariates are
very highly correlated.
In practice, the mixing seems to be reasonably good for correlations
up to 0.9.
pmomPM uses the scheme of Albert & Chib (1993) for probit models.
Value
pmomLM and pmomPM returns a list with elements
postModel
matrix with posterior samples for the model
indicator. postModel[i,j]==1
indicates that variable j was included in the model in the MCMC
iteration i
postCoef1
matrix with posterior samples for coefficients
associated to x
postCoef2
matrix with posterior samples for coefficients
associated to xadj
postPhi
vector with posterior samples for residual variance
postOther
postOther
returns posterior samples for other parameters, i.e. basically hyper-parameters.
Currently the prior precision parameter tau
margpp
Marginal posterior probability for inclusion of each
covariate. This is computed by averaging marginal post prob for
inclusion in each MCMC iteration, which is much more accurate than
simply taking colMeans(postModel)
.
pplPM returns a list with elements
optfit
Probit model fit using tauopt. It is the result of a call to pmomPM.
PPL
data.frame indicating for each value in tauseq the posterior predictive loss (PPL=G+P),
the goodness-of-fit (G) and penalty terms (P)
, the
average number of covariates in the model (msize) including xadj
and the smoothed sPPL obtained via a gam fit.
tauopt
Value of tau minimizing the PPL
Author(s)
David Rossell, Donatello Telesca
References
Johnson V.E., Rossell D. Non-Local Prior Densities for Default
Bayesian Hypothesis Tests. Journal of the Royal Statistical Society B,
2010, 72, 143-170.
Johnson V.E., Rossell D. Bayesian model selection in high-dimensional
settings. Technical report. 2011
See http://rosselldavid.googlepages.com for technical reports.
Albert, J. and Chib, S. (1993) Bayesian analysis of binary and
polychotomous response data. Journal of the American Statistical
Association, 88, p669-679
Gelfand, A. and Ghosh, S. (1998) Model choice: A minimum posterior predictive loss approach.
Biometrika, 85, p1-11.
See Also
For more details on the prior specification see msPriorSpec-class
To compute marginal densities for a given model see
pmomMarginalK, pmomMarginalU,
pimomMarginalK, pimomMarginalU.
Examples
#Simulate data
x <- matrix(rnorm(100*3),nrow=100,ncol=3)
xadj <- rep(1,nrow(x))
theta <- matrix(c(1,1,0),ncol=1)
y <- 10*xadj + x %*% theta + rnorm(100)
#Beta-binomial prior on model space
priorDelta <- modelbbprior(alpha.p=1,beta.p=1)
#Non-informative prior on residual variance
priorVar <- igprior(alpha=.01,lambda=.01)
#Product MOM prior with tau=0.3 on x coefficients
#Non-informative prior on xadj coefficients
priorCoef <- momprior(tau=0.3, tau.adj=10^6)
mom0 <- pmomLM(y=y,x=x,xadj=xadj,center=FALSE,scale=FALSE,niter=1000,
priorCoef=priorCoef,priorDelta=priorDelta,priorVar=priorVar)
round(colMeans(mom0$postModel),2)
round(colMeans(mom0$postCoef1),2)
round(colMeans(mom0$postCoef2),2)
#Alternative prior: hyper-prior on tau
priorCoef <- new("msPriorSpec",priorType='coefficients',priorDistr='pMOM',
priorPars=c(a.tau=1,b.tau=.135,tau.adj=10^6,r=1)) #hyper-prior
mom1 <-
pmomLM(y=y,x=x,xadj=xadj,center=FALSE,scale=FALSE,niter=1000,
priorCoef=priorCoef,priorDelta=priorDelta,priorVar=priorVar)
mean(mom1$postOther) #posterior mean for tau
#Probit model
n <- 500; rho <- .25; niter <- 1000
theta <- c(.4,.6,0); theta.adj <- 0
V <- diag(length(theta)); V[upper.tri(V)] <- V[lower.tri(V)] <- rho
x <- rmvnorm(n,rep(0,length(theta)),V); xadj <- matrix(1,nrow=nrow(x),ncol=1)
lpred <- as.vector(x %*% matrix(theta,ncol=1) + xadj %*% matrix(theta.adj,ncol=1))
p <- pnorm(lpred)
y <- runif(n)<p
mom2 <- pmomPM(y=y,x=x,xadj=xadj,niter=1000,priorCoef=priorCoef,
priorDelta=priorDelta,initSearch='greedy')
colMeans(mom2$postCoef1)
coef(glm(y ~ x + xadj -1, family=binomial(link='probit')))