R: Bayesian Model Averaging for generalized linear models.
bic.glm
R Documentation
Bayesian Model Averaging for generalized linear models.
Description
Bayesian Model Averaging accounts for the model uncertainty inherent in the variable selection problem by averaging over the best models in the model class according to approximate posterior model probability.
a data frame containing the variables in the model.
glm.family
a description of the error distribution and link function to
be used in the model. This can be a character string naming a
family function, a family function or the result of a call to
a family function. (See 'family' for details of family
functions.)
wt
an optional vector of weights to be used.
strict
a logical indicating whether models with more likely submodels are
eliminated. FALSE returns all models whose posterior model probability is
within a factor of 1/OR of that of the best model.
prior.param
a vector of values specifying the prior weights for each variable.
OR
a number specifying the maximum ratio for excluding models in Occam's window
maxCol
a number specifying the maximum number of columns in design matrix
(excluding intercept) to be kept.
OR.fix
width of the window which keeps models after the leaps approximation
is done.
Because the leaps and bounds gives only an approximation to BIC,
there is a need to increase the window at this first "cut" so as to
assure that no good models are deleted.
The level of this cut is at 1/(OR^OR.fix); the default value
for OR.fix is 2.
nbest
a number specifying the number of models of each size returned to
bic.glm by the modified leaps algorithm.
dispersion
a logical value specifying whether dispersion should be
estimated or not. Default is TRUE unless glm.family is poisson
or binomial
factor.type
a logical value specifying how variables of class "factor" are
handled.
A factor variable with d levels is turned into (d-1) dummy variables using a
treatment contrast.
If factor.type = TRUE, models will contain either all or none of
these dummy variables.
If factor.type = FALSE, models are free to select the dummy
variables independently.
In this case, factor.prior.adjust determines the prior on these variables.
factor.prior.adjust
a logical value specifying whether
the prior distribution on dummy variables for factors
should be adjusted when factor.type=FALSE.
When factor.prior.adjust=FALSE, all dummy variables
for variable i have prior equal to prior.param[i].
Note that this makes the prior probability of the union of these variables
much higher than prior.param[i].
Setting factor.prior.adjust=T corrects for this so that the union of
the dummies equals prior.param[i]
(and hence the deletion of the factor has a prior of
1-prior.param[i]).
This adjustment changes the individual priors on each dummy variable to '
1-(1-pp[i])^(1/(k+1)).
occam.window
a logical value specifying if Occam's window should be used.
If set to FALSE then all models selected by the modified leaps
algorithm are returned.
call
used internally
...
unused
Details
Bayesian Model Averaging accounts for the model uncertainty inherent in the variable selection problem by averaging over the best models in the model class according to approximate posterior model probability.
Value
bic.glm returns an object of class bic.glm
The function summary is used to print a summary of the results.
The function plot is used to plot posterior distributions for the coefficients.
The function imageplot generates an image of the models which were averaged over.
An object of class bic.glm is a list containing at least the following components:
postprob
the posterior probabilities of the models selected
deviance
the estimated model deviances
label
labels identifying the models selected
bic
values of BIC for the models
size
the number of independent variables in each of the models
which
a logical matrix with one row per model and one column per variable indicating whether that variable is in the model
probne0
the posterior probability that each variable is non-zero (in percent)
postmean
the posterior mean of each coefficient (from model averaging)
postsd
the posterior standard deviation of each coefficient (from model averaging)
condpostmean
the posterior mean of each coefficient conditional on the variable being included in the model
condpostsd
the posterior standard deviation of each coefficient conditional on the variable being included in the model
mle
matrix with one row per model and one column per variable giving the maximum likelihood estimate of each coefficient for each model
se
matrix with one row per model and one column per variable giving the standard error of each coefficient for each model
reduced
a logical indicating whether any variables were dropped before model averaging
dropped
a vector containing the names of those variables dropped before model averaging
call
the matched call that created the bma.lm object
Note
If more than maxcol variables are supplied, then bic.glm does stepwise
elimination of variables until maxcol variables are reached.
bic.glm handles factor variables according to the factor.type
parameter. If this is true then factor variables are kept in the model or dropped in
entirety. If false, then each dummy variable can be kept or dropped independently.
If bic.glm is used with a formula that includes interactions between factor
variables, then bic.glm will create a new factor variable to represent that
interaction, and this factor variable will be kept or dropped in entirety if
factor.type is true.
This can create interpretation problems if any of the corresponding main effects are
dropped.
Many thanks to Sanford Weisberg for making source code for leaps available.
Raftery, Adrian E. (1995). Bayesian model selection in social research (with Discussion). Sociological Methodology 1995 (Peter V. Marsden, ed.), pp. 111-196, Cambridge, Mass.: Blackwells.
An earlier version, issued as Working Paper 94-12, Center for Studies in Demography
and Ecology, University of Washington (1994) is available as a technical report
from the Department of Statistics, University of Washington.
See Also
summary.bic.glm,
print.bic.glm,
plot.bic.glm
Examples
## Not run:
### logistic regression
library("MASS")
data(birthwt)
y<- birthwt$lo
x<- data.frame(birthwt[,-1])
x$race<- as.factor(x$race)
x$ht<- (x$ht>=1)+0
x<- x[,-9]
x$smoke <- as.factor(x$smoke)
x$ptl<- as.factor(x$ptl)
x$ht <- as.factor(x$ht)
x$ui <- as.factor(x$ui)
glm.out.FT <- bic.glm(x, y, strict = FALSE, OR = 20,
glm.family="binomial", factor.type=TRUE)
summary(glm.out.FT)
imageplot.bma(glm.out.FT)
glm.out.FF <- bic.glm(x, y, strict = FALSE, OR = 20,
glm.family="binomial", factor.type=FALSE)
summary(glm.out.FF)
imageplot.bma(glm.out.FF)
glm.out.TT <- bic.glm(x, y, strict = TRUE, OR = 20,
glm.family="binomial", factor.type=TRUE)
summary(glm.out.TT)
imageplot.bma(glm.out.TT)
glm.out.TF <- bic.glm(x, y, strict = TRUE, OR = 20,
glm.family="binomial", factor.type=FALSE)
summary(glm.out.TF)
imageplot.bma(glm.out.TF)
## End(Not run)
## Not run:
### Gamma family
library(survival)
data(veteran)
surv.t<- veteran$time
x<- veteran[,-c(3,4)]
x$celltype<- factor(as.character(x$celltype))
sel<- veteran$status == 0
x<- x[!sel,]
surv.t<- surv.t[!sel]
glm.out.va <- bic.glm(x, y=surv.t, glm.family=Gamma(link="inverse"),
factor.type=FALSE)
summary(glm.out.va)
imageplot.bma(glm.out.va)
plot(glm.out.va)
## End(Not run)
### Poisson family
### Yates (teeth) data.
x<- rbind(
c(0, 0, 0),
c(0, 1, 0),
c(1, 0, 0),
c(1, 1, 1))
y<-c(4, 16, 1, 21)
n<-c(1,1,1,1)
models<- rbind(
c(1, 1, 0),
c(1, 1, 1))
glm.out.yates <- bic.glm( x, y, n, glm.family = poisson(),
factor.type=FALSE)
summary(glm.out.yates)
## Not run:
### Gaussian
library(MASS)
data(UScrime)
f <- formula(log(y) ~ log(M)+So+log(Ed)+log(Po1)+log(Po2)+log(LF)+
log(M.F)+ log(Pop)+log(NW)+log(U1)+log(U2)+
log(GDP)+log(Ineq)+log(Prob)+log(Time))
glm.out.crime <- bic.glm(f, data = UScrime, glm.family = gaussian())
summary(glm.out.crime)
# note the problems with the estimation of the posterior standard
# deviation (compare with bicreg example)
## End(Not run)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(BMA)
Loading required package: survival
Loading required package: leaps
Loading required package: robustbase
Attaching package: 'robustbase'
The following object is masked from 'package:survival':
heart
Loading required package: inline
Loading required package: rrcov
Scalable Robust Estimators with High Breakdown Point (version 1.3-11)
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/BMA/bic.glm.Rd_%03d_medium.png", width=480, height=480)
> ### Name: bic.glm
> ### Title: Bayesian Model Averaging for generalized linear models.
> ### Aliases: bic.glm bic.glm.data.frame bic.glm.matrix bic.glm.formula
> ### Keywords: regression models
>
> ### ** Examples
>
>
> ## Not run:
> ##D ### logistic regression
> ##D library("MASS")
> ##D data(birthwt)
> ##D y<- birthwt$lo
> ##D x<- data.frame(birthwt[,-1])
> ##D x$race<- as.factor(x$race)
> ##D x$ht<- (x$ht>=1)+0
> ##D x<- x[,-9]
> ##D x$smoke <- as.factor(x$smoke)
> ##D x$ptl<- as.factor(x$ptl)
> ##D x$ht <- as.factor(x$ht)
> ##D x$ui <- as.factor(x$ui)
> ##D
> ##D glm.out.FT <- bic.glm(x, y, strict = FALSE, OR = 20,
> ##D glm.family="binomial", factor.type=TRUE)
> ##D summary(glm.out.FT)
> ##D imageplot.bma(glm.out.FT)
> ##D
> ##D glm.out.FF <- bic.glm(x, y, strict = FALSE, OR = 20,
> ##D glm.family="binomial", factor.type=FALSE)
> ##D summary(glm.out.FF)
> ##D imageplot.bma(glm.out.FF)
> ##D
> ##D glm.out.TT <- bic.glm(x, y, strict = TRUE, OR = 20,
> ##D glm.family="binomial", factor.type=TRUE)
> ##D summary(glm.out.TT)
> ##D imageplot.bma(glm.out.TT)
> ##D
> ##D glm.out.TF <- bic.glm(x, y, strict = TRUE, OR = 20,
> ##D glm.family="binomial", factor.type=FALSE)
> ##D summary(glm.out.TF)
> ##D imageplot.bma(glm.out.TF)
> ## End(Not run)
>
> ## Not run:
> ##D ### Gamma family
> ##D library(survival)
> ##D data(veteran)
> ##D surv.t<- veteran$time
> ##D x<- veteran[,-c(3,4)]
> ##D x$celltype<- factor(as.character(x$celltype))
> ##D sel<- veteran$status == 0
> ##D x<- x[!sel,]
> ##D surv.t<- surv.t[!sel]
> ##D
> ##D glm.out.va <- bic.glm(x, y=surv.t, glm.family=Gamma(link="inverse"),
> ##D factor.type=FALSE)
> ##D summary(glm.out.va)
> ##D imageplot.bma(glm.out.va)
> ##D plot(glm.out.va)
> ## End(Not run)
>
> ### Poisson family
> ### Yates (teeth) data.
>
> x<- rbind(
+ c(0, 0, 0),
+ c(0, 1, 0),
+ c(1, 0, 0),
+ c(1, 1, 1))
>
> y<-c(4, 16, 1, 21)
> n<-c(1,1,1,1)
>
> models<- rbind(
+ c(1, 1, 0),
+ c(1, 1, 1))
>
> glm.out.yates <- bic.glm( x, y, n, glm.family = poisson(),
+ factor.type=FALSE)
> summary(glm.out.yates)
Call:
bic.glm.matrix(x = x, y = y, glm.family = poisson(), wt = n, factor.type = FALSE)
4 models were selected
Best 4 models (cumulative posterior probability = 1 ):
p!=0 EV SD model 1 model 2 model 3 model 4
Intercept 100 1.0456 0.5170 0.91629 1.38629 0.91629 0.86750
X1 45.9 -0.3894 0.8906 . -1.38629 . 0.09531
X2 100.0 1.7892 0.5747 2.00148 1.38629 1.85630 2.00148
X3 51.5 0.5455 0.9722 . 1.65823 0.27193 .
nVar 1 3 2 2
BIC -0.16739 0.00000 0.54115 1.12363
post prob 0.318 0.292 0.223 0.167
>
> ## Not run:
> ##D ### Gaussian
> ##D library(MASS)
> ##D data(UScrime)
> ##D f <- formula(log(y) ~ log(M)+So+log(Ed)+log(Po1)+log(Po2)+log(LF)+
> ##D log(M.F)+ log(Pop)+log(NW)+log(U1)+log(U2)+
> ##D log(GDP)+log(Ineq)+log(Prob)+log(Time))
> ##D glm.out.crime <- bic.glm(f, data = UScrime, glm.family = gaussian())
> ##D summary(glm.out.crime)
> ##D # note the problems with the estimation of the posterior standard
> ##D # deviation (compare with bicreg example)
> ## End(Not run)
>
>
>
>
>
>
> dev.off()
null device
1
>