R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Likelihood Acquired Directions

lad	R Documentation

Likelihood Acquired Directions

Description

Method to estimate the central subspace, using inverse conditional mean and conditional variance functions.

Usage

lad(X, y, numdir = NULL, nslices = NULL, numdir.test = FALSE, ...)

Arguments

`X`	Data matrix with `n` rows of observations and `p` columns of predictors. The predictors are assumed to have a continuous distribution.
`y`	Response vector of `n` observations, possibly categorical or continuous. It is assumed categorical if `nslices=NULL`.
`numdir`	Integer between 1 and p. It is the number of directions of the reduction to estimate. If not provided then it will equal the number of distinct values of the categorical response.
`nslices`	Integer number of slices. It must be provided if `y` is continuous, and must be less than n. It is used to discretize the continuous response.
`numdir.test`	Boolean. If `FALSE`, `core` computes the reduction for the specific number of directions `numdir`. If `TRUE`, it does the computation of the reduction for the `numdir` directions, from 0 to `numdir`.
`...`	Other arguments to pass to `GrassmannOptim`.

Details

Consider a regression in which the response Y is discrete with support S_Y={1,2,...,h}. Following standard practice, continuous response can be sliced into finite categories to meet this condition. Let X_y in R^p denote a random vector of predictors distributed as X|(Y=y) and assume that X_y sim N(μ_y, Δ_y), y in S_Y. Let μ=E(X) and Σ=mathrm{Var}(X) denote the marginal mean and variance of X and let Δ=E(Δ_Y) denote the average covariance matrix. Given n_y independent observations of X_y, y in S_{Y}, the goal is to obtain the maximum likelihood estimate of the d-dimensional central subspace mathcal{S}_{Y|X}, which is defined informally as the smallest subspace such that Y is independent of X given its projection P_{mathcal{S}_{Y|X}}X onto mathcal{S}_{Y|X}.

Let \tilde{Σ} denote the sample covariance matrix of X, let \tilde{Δ}_y denote the sample covariance matrix for the data with Y=y, and let \tilde{Δ}=∑_{y=1}^{h} m_y \tilde{Δ}_y where m_y is the fraction of cases observed with Y=y. The maximum likelihood estimator of mathcal{S}_{Y|X} maximizes over mathcal{S} in mathcal{G}_{(d,p)} the log-likelihood function

L(mathcal{S})=frac{n}{2}log|P_{mathcal{S}} \tilde{Σ} P_{mathcal{S}}|_0 - frac{n}{2}log|\tilde{Σ}| - frac{1}{2}∑_{y=1}^{h} n_y log|P_{mathcal{S}} \tilde{Δ}_y P_{mathcal{S}}|_0,

where |A|_0 indicates the product of the non-zero eigenvalues of a positive semi-definite symmetric matrix A, P_{mathcal{S}} indicates the projection onto the subspace mathcal{S} in the usual inner product, and mathcal{G}_{(d,p)} is the set of all d-dimensional subspaces in R^p, called Grassmann manifold. The desired reduction is then hat{Γ}^{T}X. Once the dimension of the reduction subspace is estimated, the columns of hat{Γ} are a basis for the maximum likelihood estimate of mathcal{S}_{Y|X}.

The dimension d of the sufficient reduction is to be estimated. A sequential likelihood ratio test, and information criteria (AIC, BIC) are implemented, following Cook and Forzani (2009).

Value

This command returns a list object of class ldr. The output depends on the argument numdir.test. If numdir.test=TRUE, a list of matrices is provided corresponding to the numdir values (1 through numdir) for each of the parameters Γ, Δ, and Δ_y; otherwise, a single list of matrices for a single value of numdir. The output of loglik, aic, bic, numpar are vectors of numdir elements if numdir.test=TRUE, and scalars otherwise. Following are the components returned:

`R`	The reduction data-matrix of X obtained using the centered data-matrix X. The centering of the data-matrix of X is such that each column vector is centered around its sample mean.
`Gammahat`	Estimate of Γ
`Deltahat`	Estimate of Δ
`Deltahat_y`	Estimate of Δ_y
`loglik`	Maximized value of the LAD log-likelihood.
`aic`	Akaike information criterion value.
`bic`	Bayesian information criterion value.
`numpar`	Number of parameters in the model.

Author(s)

Kofi Placid Adragni <kofi@umbc.edu>

References

Cook RD, Forzani L (2009). Likelihood-based Sufficient Dimension Reduction, J. of the American Statistical Association, Vol. 104, No. 485, 197–208.

Examples

data(flea)
fit <- lad(X=flea[,-1], y=flea[,1], numdir=2, numdir.test=TRUE)
summary(fit)
plot(fit)