R: Bayesian Model Averaging for Multinomial Logit Models
bic.mlogit
R Documentation
Bayesian Model Averaging for Multinomial Logit Models
Description
Using the methodology of Bayesian Model Averaging in the BMA package, the variable selection problem is applied to multinomial logit models in which coefficients can be estimated relative to a base alternative.
Data frame containing the variables of the model. There should be one record for each individual. Alternative-specific variables occupy single column per alternative.
choices
Vector of names of alternatives. If it is not given, it is determined from the response column of the data frame. Values of this vector should match or be a subset of those in the response column. If it is a subset, data is reduced to contain only observations whose choice is contained in choices.
base.choice
Index of the base alternative within the vector choices.
varying
Indices of variables within data that are alternative-specific.
sep
Separator of variable name and alternative name in the ‘varying’ variables.
approx
Logical. If TRUE, the function uses approximate likelihoods as they come out of the Begg & Gray approximation. If FALSE, the MNL maximum likelihood estimation is used in the last step of the model selection procedure. Note that this can significantly increase the run-time, see Details below.
include.intercepts
Logical controlling if alternative specific constants should always be included in the selected models. It only has an effect if the formula f contains the intercept, i.e. it does not contain ‘-1’. See Details below.
verbose
Logical switching log messages on and off.
...
Additional arguments passed to the bic.glm function of the BMA package.
Details
The function converts the given multinomial data into a combination of binary logistic data, as proposed in Yeung et al. (2005). It requires that the model can be specified as a set of equations of which one is considered as the base equation. If variables are included that vary over alternatives, they are normalized by subtracting the values corresponding to the base alternative. Details of the conversion algorithm are described in the vignette of this package, see vignette('conversion').
The function then applies the bic.glm function of the BMA package on the converted data by using the Begg & Gray (1984) approximation. In the last step of the variable selection procedure, if approx is FALSE, the maximum likelihood estimation (MLE) is applied to all selected models and the Bayesian Information Criterium (BIC) is recomputed using the log-likelihood of the full multinomial logistic regression model. Note that this step can be computationally very expensive. We suggest when using this option, set the verbose argument to TRUE to follow the computation progress. Note that one can use the estimate.mlogit function on the resulting object which performs the MLE on selected models only.
The BMA functions always include the intercept which in the MNL settings corresponds to the alternative specific constant (asc) of the second alternative (relative to the base alternative). If include.intercepts=TRUE (default), asc for all the remaining alternatives are also always included in the selected models. If it is set to FALSE, the asc of the remaining alternatives (i.e. third and higher) are treated as ordinary variables, i.e candidates for selection as well as exclusion.
Value
The function returns an object of class bic.mlogit containing the following components:
bic.glm
Object of class bic.glm which results from applying BMA on the binary logistic data.
bin.logit
List with results from the mlogit2logit function.
spec
Object of class mnl.spec containing the MNL specification of the full model.
bma.specifications
List of objects of class mnl.spec containing specifications for each selected model.
approx
Value of the approx argument.
Author(s)
Hana Sevcikova, Adrian Raftery
References
Begg, C.B., Gray, R. (1984) Calculation of polychotomous logistic regression parameters using individualized regressions. Biometrika 71, 11–18.
Yeung, K.Y., Bumgarner, R.E., Raftery, A.E. (2005) Bayesian model averaging: development of an improved multi-class, gene selection and classification tool for microarray data. Bioinformatics 21 (10), 2394–2402.