R: Bayesian Estimation for Multivariate Normal Data with...
bmonomvn
R Documentation
Bayesian Estimation for Multivariate Normal Data with
Monotone Missingness
Description
Bayesian estimation via sampling from the posterior distribution of the
of the mean and covariance matrix of multivariate normal (MVN)
distributed data with a monotone missingness pattern, via Gibbs Sampling.
Through the use of parsimonious/shrinkage regressions (lasso/NG &
ridge),
where standard regressions fail, this function can handle an (almost)
arbitrary amount of missing data
data matrix were each row is interpreted as a
random sample from a MVN distribution with missing
values indicated by NA
pre
logical indicating whether pre-processing of the
y is to be performed. This sorts the columns so that the
number of NAs is non-decreasing with the column index
p
when performing regressions, p is the proportion of the
number of columns to rows in the design matrix before an
alternative regression (lasso, ridge, or RJ) is performed as if
least-squares regression has “failed”.
Least-squares regression is
known to fail when the number of columns equals the number of rows,
hence a default of p = 0.9 <= 1. Alternatively, setting
p = 0 forces a parsimonious method to be used for
every regression. Intermediate settings of p allow
the user to control when least-squares regressions stop and the
parsimonious ones start; When method = "factor" the p
argument represents an integer (positive) number of initial columns
of y to treat as known factors
B
number of Burn-In MCMC sampling rounds,
during which samples are discarded
T
total number of MCMC sampling rounds to take
place after burn-in, during which samples are saved
thin
multiplicative thinning in the MCMC. Each Bayesian
(lasso) regression will discard thin*M MCMC rounds,
where M is the number of columns in its design matrix,
before a sample is saved as a draw from the posterior distribution;
Likewise if theta != 0 a further thin*N, for
N responses will be discarded
economy
indicates whether memory should be economized at
the expense of speed. When TRUE the individual Bayesian
(lasso) regressions are cleaned between uses so that only one
of them has a large footprint at any time during sampling from
the Markov chain. When FALSE (default) all regressions
are pre-allocated and the full memory footprint is realized at
the outset, saving dynamic allocations
method
indicates the Bayesian parsimonious regression
specification to be used, choosing between the lasso (default)
of Park & Casella, the NG extension, the horseshoe,
a ridge regression special case, and least-squares.
The "factor" method treats the first
p columns of y as known factors
RJ
indicates the Reversible Jump strategy to be employed.
The default argument of "p" method uses RJ whenever a
parsimonious regression is used; "bpsn" only uses
RJ for regressions with p >= n, and "none" never
uses RJ
capm
when TRUE this argument indicates that the
number of components of beta should not exceed
n, the number of response variables in a particular
regression
start
a list depicting starting values for the parameters
that are use to initialize the Markov chain. Usually this will be
a "monomvn"-class object depicting maximum likelihood
estimates output from the monomvn function.
The relevant fields are the mean vector $mu, covariance
matrix $S, monotone ordering $o (for sanity checking
with input y), component vector $ncomp and
penalty parameter vector $lambda; see note below
mprior
prior on the number of non-zero regression coefficients
(and therefore covariates) m in the model. The default
(mprior = 0) encodes the uniform prior on 0 < m < M.
A scalar value 0 <= mprior <= 1 implies a Binomial prior
Bin(m|n=M,p=mprior). A 2-vector mprior=c(g,h)
of positive values g and h represents
gives Bin(m|n=M,p) prior where p~Beta(g,h)
rd
=c(r,delta); a 2-vector of prior parameters for
lambda^2
which depends on the regression method. When method =
"lasso" then the components are the alpha
(shape) and beta (rate) parameters to the a
gamma distribution G(r,delta);
when method = "ridge" the components are the
alpha (shape) and beta (scale) parameters
to an inverse-gamma distribution IG(r/2,delta/2)
theta
the rate parameter (> 0) to the exponential prior
on the degrees of freedom paramter nu for each regression
model implementing Student-t errors (for each column of
Y marginally) by a scale-mixture prior. See
blasso for more details.
The default setting of theta = 0 turns off this prior,
defaulting to a normal errors prior. A negative setting
triggers a pooling of the degrees of freedom parameter
across all columns of Y. I.e., Y is modeled as
multivariate-t. In this case abs{theta} is used as the
prior parameterization
rao.s2
indicates whether to Rao-Blackwellized samples for
s^2 should be used (default TRUE); see
the details section of blasso for more information
QP
if non-NULL this argument should either be
TRUE, a positive integer, or contain a list
specifying a Quadratic Program to solve as a function of the
samples of mu = dvec and
Sigma = Dmat in the notation of solve.QP;
see default.QP for a default specification that
is used when QP = TRUE or a positive integer is is given;
more details are below
verb
verbosity level; currently only verb = 0 and
verb = 1 are supported
trace
if TRUE then samples from all parameters are
saved to files in the CWD, and then read back into the
"monomvn"-class object upon return
Details
If pre = TRUE then bmonomvn first re-arranges the columns
of y into nondecreasing order with respect to the number of
missing (NA) entries. Then (at least) the first column should
be completely observed.
Samples from the posterior distribution of the MVN mean vector and
covariance matrix are obtained sampling
from the posterior distribution of Bayesian regression models.
The methodology for converting these to samples from the mean vector
and covariance matrix is outlined in the monomvn
documentation, detailing a similarly structured maximum likelihood
approach. Also see the references below.
Whenever the regression model is ill–posed (i.e., when there are
more covariates than responses, or a
“big p small n” problem) then
Bayesian lasso or ridge regressions – possibly augmented with Reversible
Jump (RJ) for model selection – are used instead.
See the Park & Casella reference below, and the blasso
documentation. To guarantee each regression is well posed the
combination setting of method="lsr" and RJ="none"
is not allowed.
As in monomvn the p argument can be used to
turn on lasso or ridge regressions (possibly with RJ) at other times.
The exception is the "factor" method which always involves
an OLS regression on (a subset of) the first p
columns of y.
Samples from a function of samples of mu and Sigma
can be obtained by specifying a Quadratic program via the
argument QP. The idea is to allow for the calculation of
the distribution of minimum variance and mean–variance portfolios,
although the interface is quite general. See default.QP
for more details, as default.QP(ncol(y)) is used
when the argument QP = TRUE is given. When a positive integer
is given, then the first QP columns of y are treated
as factors by using
default.QP(ncol(y) - QP)
instead. The result is that the corresponding components of (samples of)
mu and rows/cols of S are not factored into the
specification of the resulting Quadratic Program
Value
bmonomvn returns an object of class "monomvn",
which is a list containing the inputs above and a
subset of the components below.
call
a copy of the function call as used
mu
estimated mean vector with columns corresponding to the
columns of y
S
estimated covariance matrix with rows and columns
corresponding to the columns of y
mu.var
estimated variance of the mean vector with columns
corresponding to the columns of y
mu.cov
estimated covariance matrix of the mean vector
with columns corresponding to the columns of y
S.var
estimated variance of the individual components of the
covariance matrix with columns and rows corresponding to the columns
of y
mu.map
estimated maximum a' posteriori (MAP) of the
mean vector with columns corresponding to the columns of y
S.map
estimated MAP of the individual
components of the covariance matrix with columns and rows
corresponding to the columns of y
S.nz
posterior probability that the individual entries of
the covariance matrix are non–zero
Si.nz
posterior probability that the individual entries of
the inverse of the covariance matrix are non–zero
nu
when theta < 0 this field provides a trace of
the pooled nu parameter to the multivariate-t distribution
lpost.map
log posterior probability of the MAP estimate
which.map
gives the time index of the sample corresponding to
the MAP estimate
llik
a trace of the log likelihood of the data
llik.norm
a trace of the log likelihood
under the Normal errors model when sampling under the
Student-t model; i.e., it is not present unless theta > 0.
Used for calculating Bayes Factors
na
when pre = TRUE this is a vector containing number of
NA entries in each column of y
o
when pre = TRUE this is a vector containing the
index of each column in the sorting of the columns of y
obtained by o <- order(na)
method
method of regression used on each column, or
"bcomplete" indicating that no regression was used
thin
the (actual) number of thinning rounds used for the
regression (method) in each column
lambda2
records the mean lambda^2 value
found in the trace of the Bayesian Lasso regressions. Zero-values
result when the column corresponds to a complete
case or an ordinary least squares regression (these would be
NA entries from monomvn)
ncomp
records the mean number of components
(columns of the design matrix) used in the regression model for
each column of y. If input RJ = FALSE then this simply
corresponds to the monotone ordering (these would correspond to
the NA entries from monomvn).
When RJ = TRUE
the monotone ordering is an upper bound (on each entry)
trace
if input trace = TRUE then this field contains
traces of the samples of mu in the field $mu and
of S in the field $S, and of all regression
parameters for each of the m = length(mu) columns in the field
$reg. This $reg field is a stripped-down
"blasso"-class object so that the methods of that object
may be used for analysis. If data augmentation is required to
complete the monotone missingness pattern, then samples from these
entries of Y are contained in $DA where the column
names indicate the i-j entry of Y sampled;
see the R output below
R
gives a matrix version of the missingness pattern
used: 0-entries mean observed; 1-entries indicate
missing values conforming to a monotone pattern; 2-entries
indicate missing values that require data augmentation to complete
a monotone missingness pattern
B
from inputs: number of Burn-In MCMC sampling rounds,
during which samples are discarded
T
from inputs: total number of MCMC sampling rounds to take
place after burn-in, during which samples are saved
r
from inputs: alpha (shape) parameter to the gamma
distribution prior for the lasso parameter lambda
delta
from inputs: beta (rate) parameter to the gamma
distribution prior for the lasso parameter lambda
QP
if a valid (non–FALSE or NULL) QP
argument is given, then this field contains the specification of
a Quadratic Program in the form of a list with entries including
$dvec, $Amat, $b0, and $meq, similar
to the usage in solve.QP, and some
others; see default.QP for more details
W
when input QP = TRUE is given, then this field
contains a T*ncol(y) matrix of samples from the posterior
distribution of the solution to the Quadratic Program, which can
be visualized via plot.monomvn using the argument
which = "QP"
Note
Whenever the bmonomvn algorithm requires a regression
where p >= n, i.e., if any of the columns in the y
matrix have fewer non–NA elements than the number of
columns with more non–NA elements, then it is helpful
to employ both lasso/ridge and the RJ method.
It is important that any starting values provided in the
start be compatible with the regression model
specified by inputs RJ and method. Any
incompatibilities will result with a warning that
(alternative) default action was taken and may result in
an undesired (possibly inferior) model being fit
R.B. Gramacy and E. Pantaleo (2010).
Shrinkage regression for multivariate inference with missing
data, and an application to portfolio balancing. Preprint available on
arXiv:0710.5837
http://arxiv.org/abs/0907.2135
Roderick J.A. Little and Donald B. Rubin (2002).
Statistical Analysis with Missing Data, Second Edition.
Wilely.
blasso, monomvn,
default.QP, em.norm in the now defunct
norm and mvnmle packages, and returns
Examples
## standard usage, duplicating the results in
## Little and Rubin, section 7.4.3
data(cement.miss)
out <- bmonomvn(cement.miss)
out
out$mu
out$S
##
## A bigger example, comparing the various
## parsimonious methods
##
## generate N=100 samples from a 10-d random MVN
xmuS <- randmvn(100, 20)
## randomly impose monotone missingness
xmiss <- rmono(xmuS$x)
## using least squares only when necessary,
obl <- bmonomvn(xmiss)
obl
## look at the posterior variability
par(mfrow=c(1,2))
plot(obl)
plot(obl, "S")
## compare to maximum likelihood
Ellik.norm(obl$mu, obl$S, xmuS$mu, xmuS$S)
oml <- monomvn(xmiss, method="lasso")
Ellik.norm(oml$mu, oml$S, xmuS$mu, xmuS$S)
##
## a min-variance portfolio allocation example
##
## get the returns data, and use 20 random cols
data(returns)
train <- returns[,sample(1:ncol(returns), 20)]
## missingness pattern requires DA; also gather
## samples from the solution to a QP
obl.da <- bmonomvn(train, p=0, QP=TRUE)
## plot the QP weights distribution
plot(obl.da, "QP", xaxis="index")
## get ML solution: will warn about monotone violations
suppressWarnings(oml.da <- monomvn(train, method="lasso"))
## add mean and MLE comparison, requires the
## quadprog library for the solve.QP function
add.pe.QP(obl.da, oml.da)
## now consider adding in the market as a factor
data(market)
mtrain <- cbind(market, train)
## fit the model using only factor regressions
obl.daf <- bmonomvn(mtrain, method="factor", p=1, QP=1)
plot(obl.daf, "QP", xaxis="index", main="using only factors")
suppressWarnings(oml.daf <- monomvn(mtrain, method="factor"))
add.pe.QP(obl.daf, oml.daf)
##
## a Bayes/MLE comparison using least squares sparingly
##
## fit Bayesian and classical lasso
obls <- bmonomvn(xmiss, p=0.25)
Ellik.norm(obls$mu, obls$S, xmuS$mu, xmuS$S)
omls <- monomvn(xmiss, p=0.25, method="lasso")
Ellik.norm(omls$mu, omls$S, xmuS$mu, xmuS$S)
## compare to ridge regression
obrs <- bmonomvn(xmiss, p=0.25, method="ridge")
Ellik.norm(obrs$mu, obrs$S, xmuS$mu, xmuS$S)
omrs <- monomvn(xmiss, p=0.25, method="ridge")
Ellik.norm(omrs$mu, omrs$S, xmuS$mu, xmuS$S)
## using the maximum likelihood solution to initialize
## the Markov chain and avoid burn-in.
ob2s <- bmonomvn(xmiss, p=0.25, B=0, start=omls, RJ="p")
Ellik.norm(ob2s$mu, ob2s$S, xmuS$mu, xmuS$S)