generalized linear model formula for the full model with all
predictors, Y ~ X. All code assumes that an intercept will be
included in each model.
data
data frame
family
a description of the error distribution and link
function for exponential family;
currently only binomial() with the logitistic linke is available in
this version.
n.models
number of unique models to keep. If NULL, BAS will
attempt to enumerate unless p > 35 or method="MCMC". For any of
methods using MCMC algorithms that sample with replacement, sampling
will stop when the number of iterations exceeds the min of
'n.models' or 'MCMC.iterations' and on exit 'n.models' is updated to
reflect the unique number of models that have been sampled.
betaprior
Prior on coefficients for model coefficients (except
intercept). Options in clude CCH, robust, beta-prime, AIC, BIC.
modelprior
Family of prior distribution on the models. Choices
include uniform, Bernoulli or beta.binomial.
initprobs
vector of length p with the initial inclusion
probabilities used for sampling without replacement (the intercept
will be included with probability one and does not need to be added here) or a character
string giving the method used to construct the sampling probabilities
if "Uniform" each predictor variable is equally likely to be
sampled (equivalent to random sampling without replacement). If
"eplogp", use the eplogprob function to aproximate the
Bayes factor using p-values to find initial marginal inclusion probabilitites and
sample without replacement using these
inclusion probabilaties, which may be updated using estimates of the
marginal inclusion probabilites. "eplogp" assumes that MLEs from the
full model exist; for problems where that is not the case or 'p' is
large, initial sampling probabilities may be obtained using
eplogprob.marg which fits a model to each predictor seaparately.
For variables that should always be
included set the corresponding initprobs to 1. To run a
Markov Chain to provide initial estimates of marginal
inclusion probabilities, use method="MCMC+BAS" below.
method
A character variable indicating which sampling method to
use: method="BAS" uses Bayesian Adaptive Sampling (without
replacement) using the sampling probabilities given in initprobs and
updates using the marginal inclusion probabilities to direct the search/sample;
method="MCMC" combines a random walk Metropolis Hastings (as in MC3 of
Raftery et al 1997) with a random swap of a variable included with a
variable that is currently excluded (see Clyde, Ghosh, and Littman
(2010) for details);
method="MCMC+BAS" runs an initial MCMC as above to calculate marginal
inclusion probabilities and then samples without replacement as in
BAS; method = "deterministic" runs an deterministic sampling using
the initial probabilites (no updating); this is recommended for fast
enumeration or if a model of independence is a good approximation to
the joint posterior distribution of the model indicators. For
BAS, the sampling probabilities can be updated as more models are
sampled. (see 'update' below). We recommend "MCMC+BAS" or "MCMC"
for high dimensional problems.
update
number of iterations between potential updates of the
sampling probabilities in the "BAS" method. If NULL do not update,
otherwise the algorithm will update using the marginal inclusion
probabilities as they change while sampling takes place. For large
model spaces, updating is recommended. If the model space will be
enumerated, leave at the default.
bestmodel
optional binary vector representing a model to
initialize the sampling. If NULL sampling starts with the null
model
prob.rw
For any of the MCMC methods, probability of using the
random-walk proposal; otherwise use a random "flip" move to propose
a new model.
MCMC.iterations
Number of models to sample when using any of
the MCMC options; should be greater than 'n.models'.
control
a list of parameters that control convergence in the
fitting process. See the documentation for
glm.control()
offset
a priori known component to be included in the linear
predictor
weights
optional vector of weights to be used in the fitting
process. SHould be NULL or a numeric vector.
laplace
logical variable for whether to use a Laplace
approximate for integration with respect to g to obtain the marginal
likelihood. If FALSE the Cephes library is used which may be
inaccurate for large n or large values of the Wald Chisquared statistic.
Details
BAS provides several search
algorithms to find high probability models for use in Bayesian Model
Averaging or Bayesian model selection. For p less than 20-25, BAS can
enumerate all models depending on memory availability, for larger p, BAS
samples without replacement using random or deterministic sampling. The
Bayesian Adaptive Sampling algorithm of Clyde, Ghosh, Littman (2010)
samples models without replacement using the initial sampling
probabilities, and will optionally update the sampling probabilities
every "update" models using the estimated marginal inclusion
probabilties. BAS uses different methods to obtain the initprobs,
which may impact the results in high-dimensional problems.
The deterinistic sampler provides a list of the top models in order of an
approximation of independence using the provided initprobs. This
may be effective after running the other algorithms to identify high
probability models and works well if
the correlations of variables are small to modest. The priors on
coefficients are mixtures of g-priors that provide approximations to the
power prior.
Value
bas.glm returns an object of class BMA
An object of class BMA is a list containing at least the following components:
postprobs
the posterior probabilities of the models selected
priorprobs
the prior probabilities of the models selected
logmarg
values of the log of the marginal likelihood for the
models
n.vars
total number of independent variables in the full model,
including the intercept
size
the number of independent variables in each of the models,
includes the intercept
which
a list of lists with one list per model with variables
that are included in the model
probne0
the posterior probability that each variable is non-zero
coefficients
list of lists with one list per model giving the GLM estimate of each (nonzero) coefficient for each model.
se
list of lists with one list per model giving the GLM standard error of each coefficient for each model
deviance
the GLM deviance for each model
modelprior
the prior distribution on models that created the BMA object
Q
the Q statistic for each model used in the
marginal likelihood approximation
Clyde, M. Ghosh, J. and Littman, M. (2010) Bayesian Adaptive Sampling
for Variable Selection and Model Averaging. Journal of Computational
Graphics and Statistics. 20:80-101 http://dx.doi.org/10.1198/jcgs.2010.09049
Raftery, A.E, Madigan, D. and Hoeting, J.A. (1997) Bayesian Model
Averaging for Linear Regression Models. Journal of the American
Statistical Association.
Examples
##---- Should be DIRECTLY executable !! ----
library(MASS)
data(Pima.tr)
out = bas.glm(type ~ ., data=Pima.tr, n.models= 2^7, method="BAS",
betaprior=CCH(a=1, b=532/2, s=0), family=binomial(),
modelprior=beta.binomial(1,1), laplace=FALSE)
summary(out)
image(out)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(BAS)
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/BAS/bas.glm.Rd_%03d_medium.png", width=480, height=480)
> ### Name: bas.glm
> ### Title: Bayesian Adaptive Sampling Without Replacement for Variable
> ### Selection in Generalized Linear Models
> ### Aliases: bas.glm
> ### Keywords: GLM regression
>
> ### ** Examples
>
> ##---- Should be DIRECTLY executable !! ----
> library(MASS)
> data(Pima.tr)
>
> out = bas.glm(type ~ ., data=Pima.tr, n.models= 2^7, method="BAS",
+ betaprior=CCH(a=1, b=532/2, s=0), family=binomial(),
+ modelprior=beta.binomial(1,1), laplace=FALSE)
>
> summary(out)
Intercept npreg glu bp skin bmi ped age BF PostProbs R2 dim
[1,] 1 0 1 0 0 1 1 1 1.000000000 0.1596 0.2938 5
[2,] 1 1 1 0 0 1 1 1 0.440649426 0.1172 0.3040 6
[3,] 1 1 1 0 0 1 1 0 0.620908563 0.0991 0.2901 5
[4,] 1 0 1 0 0 0 1 1 0.462831904 0.0739 0.2703 4
[5,] 1 1 1 1 1 1 1 1 0.008659168 0.0484 0.3043 8
logmarg
[1,] -101.7878
[2,] -102.6073
[3,] -102.2643
[4,] -102.5581
[5,] -106.5369
> image(out)
>
>
>
>
>
> dev.off()
null device
1
>