R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Bayesian Estimation for Multivariate Normal Data with...

bmonomvn

R Documentation

Bayesian Estimation for Multivariate Normal Data with Monotone Missingness

Description

Bayesian estimation via sampling from the posterior distribution of the of the mean and covariance matrix of multivariate normal (MVN) distributed data with a monotone missingness pattern, via Gibbs Sampling. Through the use of parsimonious/shrinkage regressions (lasso/NG & ridge), where standard regressions fail, this function can handle an (almost) arbitrary amount of missing data

Usage

bmonomvn(y, pre = TRUE, p = 0.9, B = 100, T = 200, thin = 1,
         economy = FALSE, method = c("lasso", "ridge", "lsr", "factor",
         "hs", "ng"), RJ = c("p", "bpsn", "none"), capm = TRUE,
         start = NULL, mprior = 0, rd = NULL, theta = 0, rao.s2 = TRUE,
         QP = NULL, verb = 1, trace = FALSE)

Arguments

`y`	data `matrix` were each row is interpreted as a random sample from a MVN distribution with missing values indicated by `NA`
`pre`	logical indicating whether pre-processing of the `y` is to be performed. This sorts the columns so that the number of `NA`s is non-decreasing with the column index
`p`	when performing regressions, `p` is the proportion of the number of columns to rows in the design matrix before an alternative regression (lasso, ridge, or RJ) is performed as if least-squares regression has “failed”. Least-squares regression is known to fail when the number of columns equals the number of rows, hence a default of `p = 0.9 <= 1`. Alternatively, setting `p = 0` forces a parsimonious method to be used for every regression. Intermediate settings of `p` allow the user to control when least-squares regressions stop and the parsimonious ones start; When `method = "factor"` the `p` argument represents an integer (positive) number of initial columns of `y` to treat as known factors
`B`	number of Burn-In MCMC sampling rounds, during which samples are discarded
`T`	total number of MCMC sampling rounds to take place after burn-in, during which samples are saved
`thin`	multiplicative thinning in the MCMC. Each Bayesian (lasso) regression will discard `thinM` MCMC rounds, where `M` is the number of columns in its design matrix, before a sample is saved as a draw from the posterior distribution; Likewise if `theta != 0` a further `thinN`, for `N` responses will be discarded
`economy`	indicates whether memory should be economized at the expense of speed. When `TRUE` the individual Bayesian (lasso) regressions are cleaned between uses so that only one of them has a large footprint at any time during sampling from the Markov chain. When `FALSE` (default) all regressions are pre-allocated and the full memory footprint is realized at the outset, saving dynamic allocations
`method`	indicates the Bayesian parsimonious regression specification to be used, choosing between the lasso (default) of Park & Casella, the NG extension, the horseshoe, a ridge regression special case, and least-squares. The `"factor"` method treats the first `p` columns of `y` as known factors
`RJ`	indicates the Reversible Jump strategy to be employed. The default argument of `"p"` method uses RJ whenever a parsimonious regression is used; `"bpsn"` only uses RJ for regressions with `p >= n`, and `"none"` never uses RJ
`capm`	when `TRUE` this argument indicates that the number of components of beta should not exceed n, the number of response variables in a particular regression
`start`	a list depicting starting values for the parameters that are use to initialize the Markov chain. Usually this will be a `"monomvn"`-class object depicting maximum likelihood estimates output from the `monomvn` function. The relevant fields are the mean vector `$mu`, covariance matrix `$S`, monotone ordering `$o` (for sanity checking with input `y`), component vector `$ncomp` and penalty parameter vector `$lambda`; see note below
`mprior`	prior on the number of non-zero regression coefficients (and therefore covariates) `m` in the model. The default (`mprior = 0`) encodes the uniform prior on `0 < m < M`. A scalar value `0 <= mprior <= 1` implies a Binomial prior `Bin(m\|n=M,p=mprior)`. A 2-vector `mprior=c(g,h)` of positive values `g` and `h` represents gives `Bin(m\|n=M,p)` prior where `p~Beta(g,h)`
`rd`	`=c(r,delta)`; a 2-vector of prior parameters for lambda^2 which depends on the regression method. When `method = "lasso"` then the components are the alpha (shape) and beta (rate) parameters to the a gamma distribution `G(r,delta)`; when `method = "ridge"` the components are the alpha (shape) and beta (scale) parameters to an inverse-gamma distribution `IG(r/2,delta/2)`
`theta`	the rate parameter (`> 0`) to the exponential prior on the degrees of freedom paramter `nu` for each regression model implementing Student-t errors (for each column of `Y` marginally) by a scale-mixture prior. See `blasso` for more details. The default setting of `theta = 0` turns off this prior, defaulting to a normal errors prior. A negative setting triggers a pooling of the degrees of freedom parameter across all columns of `Y`. I.e., `Y` is modeled as multivariate-t. In this case `abs{theta}` is used as the prior parameterization
`rao.s2`	indicates whether to Rao-Blackwellized samples for s^2 should be used (default `TRUE`); see the details section of `blasso` for more information
`QP`	if non-`NULL` this argument should either be `TRUE`, a positive integer, or contain a `list` specifying a Quadratic Program to solve as a function of the samples of `mu = dvec` and `Sigma = Dmat` in the notation of `solve.QP`; see `default.QP` for a default specification that is used when `QP = TRUE` or a positive integer is is given; more details are below
`verb`	verbosity level; currently only `verb = 0` and `verb = 1` are supported
`trace`	if `TRUE` then samples from all parameters are saved to files in the CWD, and then read back into the `"monomvn"`-class object upon return

Details

If pre = TRUE then bmonomvn first re-arranges the columns of y into nondecreasing order with respect to the number of missing (NA) entries. Then (at least) the first column should be completely observed.

Samples from the posterior distribution of the MVN mean vector and covariance matrix are obtained sampling from the posterior distribution of Bayesian regression models. The methodology for converting these to samples from the mean vector and covariance matrix is outlined in the monomvn documentation, detailing a similarly structured maximum likelihood approach. Also see the references below.

Whenever the regression model is ill–posed (i.e., when there are more covariates than responses, or a “big p small n” problem) then Bayesian lasso or ridge regressions – possibly augmented with Reversible Jump (RJ) for model selection – are used instead. See the Park & Casella reference below, and the blasso documentation. To guarantee each regression is well posed the combination setting of method="lsr" and RJ="none" is not allowed. As in monomvn the p argument can be used to turn on lasso or ridge regressions (possibly with RJ) at other times. The exception is the "factor" method which always involves an OLS regression on (a subset of) the first p columns of y.

Samples from a function of samples of mu and Sigma can be obtained by specifying a Quadratic program via the argument QP. The idea is to allow for the calculation of the distribution of minimum variance and mean–variance portfolios, although the interface is quite general. See default.QP for more details, as default.QP(ncol(y)) is used when the argument QP = TRUE is given. When a positive integer is given, then the first QP columns of y are treated as factors by using

default.QP(ncol(y) - QP)

instead. The result is that the corresponding components of (samples of) mu and rows/cols of S are not factored into the specification of the resulting Quadratic Program

Value

bmonomvn returns an object of class "monomvn", which is a list containing the inputs above and a subset of the components below.

`call`	a copy of the function call as used
`mu`	estimated mean vector with columns corresponding to the columns of `y`
`S`	estimated covariance matrix with rows and columns corresponding to the columns of `y`
`mu.var`	estimated variance of the mean vector with columns corresponding to the columns of `y`
`mu.cov`	estimated covariance matrix of the mean vector with columns corresponding to the columns of `y`
`S.var`	estimated variance of the individual components of the covariance matrix with columns and rows corresponding to the columns of `y`
`mu.map`	estimated maximum a' posteriori (MAP) of the mean vector with columns corresponding to the columns of `y`
`S.map`	estimated MAP of the individual components of the covariance matrix with columns and rows corresponding to the columns of `y`
`S.nz`	posterior probability that the individual entries of the covariance matrix are non–zero
`Si.nz`	posterior probability that the individual entries of the inverse of the covariance matrix are non–zero
`nu`	when `theta < 0` this field provides a trace of the pooled `nu` parameter to the multivariate-t distribution
`lpost.map`	log posterior probability of the MAP estimate
`which.map`	gives the time index of the sample corresponding to the MAP estimate
`llik`	a trace of the log likelihood of the data
`llik.norm`	a trace of the log likelihood under the Normal errors model when sampling under the Student-t model; i.e., it is not present unless `theta > 0`. Used for calculating Bayes Factors
`na`	when `pre = TRUE` this is a vector containing number of `NA` entries in each column of `y`
`o`	when `pre = TRUE` this is a vector containing the index of each column in the sorting of the columns of `y` obtained by `o <- order(na)`
`method`	method of regression used on each column, or `"bcomplete"` indicating that no regression was used
`thin`	the (actual) number of thinning rounds used for the regression (`method`) in each column
`lambda2`	records the mean lambda^2 value found in the trace of the Bayesian Lasso regressions. Zero-values result when the column corresponds to a complete case or an ordinary least squares regression (these would be `NA` entries from `monomvn`)
`ncomp`	records the mean number of components (columns of the design matrix) used in the regression model for each column of `y`. If input `RJ = FALSE` then this simply corresponds to the monotone ordering (these would correspond to the `NA` entries from `monomvn`). When `RJ = TRUE` the monotone ordering is an upper bound (on each entry)
`trace`	if input `trace = TRUE` then this field contains traces of the samples of mu in the field `$mu` and of S in the field `$S`, and of all regression parameters for each of the `m = length(mu)` columns in the field `$reg`. This `$reg` field is a stripped-down `"blasso"`-class object so that the methods of that object may be used for analysis. If data augmentation is required to complete the monotone missingness pattern, then samples from these entries of `Y` are contained in `$DA` where the column names indicate the `i`-`j` entry of `Y` sampled; see the `R` output below
`R`	gives a `matrix` version of the missingness pattern used: `0`-entries mean observed; `1`-entries indicate missing values conforming to a monotone pattern; `2`-entries indicate missing values that require data augmentation to complete a monotone missingness pattern
`B`	from inputs: number of Burn-In MCMC sampling rounds, during which samples are discarded
`T`	from inputs: total number of MCMC sampling rounds to take place after burn-in, during which samples are saved
`r`	from inputs: alpha (shape) parameter to the gamma distribution prior for the lasso parameter lambda
`delta`	from inputs: beta (rate) parameter to the gamma distribution prior for the lasso parameter lambda
`QP`	if a valid (non–`FALSE` or `NULL`) `QP` argument is given, then this field contains the specification of a Quadratic Program in the form of a list with entries including `$dvec`, `$Amat`, `$b0`, and `$meq`, similar to the usage in `solve.QP`, and some others; see `default.QP` for more details
`W`	when input `QP = TRUE` is given, then this field contains a `T*ncol(y)` matrix of samples from the posterior distribution of the solution to the Quadratic Program, which can be visualized via `plot.monomvn` using the argument `which = "QP"`

Note

Whenever the bmonomvn algorithm requires a regression where p >= n, i.e., if any of the columns in the y matrix have fewer non–NA elements than the number of columns with more non–NA elements, then it is helpful to employ both lasso/ridge and the RJ method.

It is important that any starting values provided in the start be compatible with the regression model specified by inputs RJ and method. Any incompatibilities will result with a warning that (alternative) default action was taken and may result in an undesired (possibly inferior) model being fit

Author(s)

Robert B. Gramacy rbgramacy@chicagobooth.edu

References

R.B. Gramacy and E. Pantaleo (2010). Shrinkage regression for multivariate inference with missing data, and an application to portfolio balancing. Preprint available on arXiv:0710.5837 http://arxiv.org/abs/0907.2135

Roderick J.A. Little and Donald B. Rubin (2002). Statistical Analysis with Missing Data, Second Edition. Wilely.

http://bobby.gramacy.com/r_packages/monomvn

Examples

## standard usage, duplicating the results in
## Little and Rubin, section 7.4.3
data(cement.miss)
out <- bmonomvn(cement.miss)
out
out$mu
out$S

##
## A bigger example, comparing the various
## parsimonious methods
##

## generate N=100 samples from a 10-d random MVN
xmuS <- randmvn(100, 20)

## randomly impose monotone missingness
xmiss <- rmono(xmuS$x)

## using least squares only when necessary,
obl <- bmonomvn(xmiss)
obl

## look at the posterior variability
par(mfrow=c(1,2))
plot(obl)
plot(obl, "S")

## compare to maximum likelihood
Ellik.norm(obl$mu, obl$S, xmuS$mu, xmuS$S)
oml <- monomvn(xmiss, method="lasso")
Ellik.norm(oml$mu, oml$S, xmuS$mu, xmuS$S)


##
## a min-variance portfolio allocation example
##

## get the returns data, and use 20 random cols
data(returns)
train <- returns[,sample(1:ncol(returns), 20)]

## missingness pattern requires DA; also gather
## samples from the solution to a QP
obl.da <- bmonomvn(train, p=0, QP=TRUE)

## plot the QP weights distribution
plot(obl.da, "QP", xaxis="index")

## get ML solution: will warn about monotone violations
suppressWarnings(oml.da <- monomvn(train, method="lasso"))

## add mean and MLE comparison, requires the
## quadprog library for the solve.QP function
add.pe.QP(obl.da, oml.da)

## now consider adding in the market as a factor
data(market)
mtrain <- cbind(market, train)

## fit the model using only factor regressions
obl.daf <- bmonomvn(mtrain, method="factor", p=1, QP=1)
plot(obl.daf, "QP", xaxis="index", main="using only factors")
suppressWarnings(oml.daf <- monomvn(mtrain, method="factor"))
add.pe.QP(obl.daf, oml.daf)


##
## a Bayes/MLE comparison using least squares sparingly
##

## fit Bayesian and classical lasso
obls <- bmonomvn(xmiss, p=0.25)
Ellik.norm(obls$mu, obls$S, xmuS$mu, xmuS$S)
omls <- monomvn(xmiss, p=0.25, method="lasso")
Ellik.norm(omls$mu, omls$S, xmuS$mu, xmuS$S)

## compare to ridge regression
obrs <- bmonomvn(xmiss, p=0.25, method="ridge")
Ellik.norm(obrs$mu, obrs$S, xmuS$mu, xmuS$S)
omrs <- monomvn(xmiss, p=0.25, method="ridge")
Ellik.norm(omrs$mu, omrs$S, xmuS$mu, xmuS$S)

## using the maximum likelihood solution to initialize
## the Markov chain and avoid burn-in.  
ob2s <- bmonomvn(xmiss, p=0.25, B=0, start=omls, RJ="p")
Ellik.norm(ob2s$mu, ob2s$S, xmuS$mu, xmuS$S)