Data matrix with n rows of observations and p columns of predictors. The predictors are assumed to have a continuous distribution.
y
Response vector of n observations, possibly categorical or continuous. It is assumed categorical if nslices=NULL.
numdir
Integer between 1 and p. It is the number of directions of the reduction to estimate. If not provided then it will equal the number of distinct values of the categorical response.
nslices
Integer number of slices. It must be provided if y is continuous, and must be less than n. It is used to discretize the continuous response.
numdir.test
Boolean. If FALSE, core computes the reduction for the specific number of directions numdir. If TRUE, it does the computation of the reduction for the numdir directions, from 0 to numdir.
...
Other arguments to pass to GrassmannOptim.
Details
Consider a regression in which the response Y is discrete with support S_Y={1,2,...,h}.
Following standard practice, continuous response can be sliced into finite categories to meet this condition.
Let X_y in R^p denote a random vector of predictors distributed as X|(Y=y) and assume
that X_y sim N(μ_y, Δ_y), y in S_Y. Let μ=E(X) and Σ=mathrm{Var}(X)
denote the marginal mean and variance of X and let Δ=E(Δ_Y) denote the average covariance matrix.
Given n_y independent observations of X_y, y in S_{Y}, the goal is to obtain the maximum likelihood
estimate of the d-dimensional central subspace mathcal{S}_{Y|X}, which is defined informally as the smallest
subspace such that Y is independent of X given its projection P_{mathcal{S}_{Y|X}}X
onto mathcal{S}_{Y|X}.
Let \tilde{Σ} denote the sample covariance matrix of X, let \tilde{Δ}_y denote the sample
covariance matrix for the data with Y=y, and let \tilde{Δ}=∑_{y=1}^{h} m_y \tilde{Δ}_y where m_y
is the fraction of cases observed with Y=y. The maximum likelihood estimator of mathcal{S}_{Y|X} maximizes over
mathcal{S} in mathcal{G}_{(d,p)} the log-likelihood function
where |A|_0 indicates the product of the non-zero eigenvalues of a positive semi-definite symmetric
matrix A, P_{mathcal{S}} indicates the projection onto the subspace mathcal{S} in the usual inner product, and mathcal{G}_{(d,p)} is the set of all d-dimensional subspaces in R^p, called Grassmann manifold. The desired reduction is then hat{Γ}^{T}X. Once the dimension of the reduction subspace is estimated, the columns of hat{Γ} are a basis for the maximum likelihood estimate of mathcal{S}_{Y|X}.
The dimension d of the sufficient reduction is to be estimated. A sequential likelihood ratio test, and information criteria (AIC, BIC) are implemented, following Cook and Forzani (2009).
Value
This command returns a list object of class ldr. The output depends on the argument numdir.test. If numdir.test=TRUE, a list of matrices is provided corresponding to the numdir values (1 through numdir) for each of the parameters Γ, Δ, and Δ_y; otherwise, a single list of matrices for a single value of numdir.
The output of loglik, aic, bic, numpar are vectors of numdir elements if numdir.test=TRUE, and scalars otherwise. Following are the components returned:
R
The reduction data-matrix of X obtained using the centered data-matrix X. The centering of the data-matrix of X is such that each column vector is centered around its sample mean.
Gammahat
Estimate of Γ
Deltahat
Estimate of Δ
Deltahat_y
Estimate of Δ_y
loglik
Maximized value of the LAD log-likelihood.
aic
Akaike information criterion value.
bic
Bayesian information criterion value.
numpar
Number of parameters in the model.
Author(s)
Kofi Placid Adragni <kofi@umbc.edu>
References
Cook RD, Forzani L (2009). Likelihood-based Sufficient Dimension Reduction, J. of the American Statistical Association, Vol. 104, No. 485, 197–208.
See Also
core, pfc
Examples
data(flea)
fit <- lad(X=flea[,-1], y=flea[,1], numdir=2, numdir.test=TRUE)
summary(fit)
plot(fit)