Last data update: 2014.03.03

R: Principal fitted components
pfcR Documentation

Principal fitted components

Description

Principal fitted components model for sufficient dimension reduction. This function estimates all parameters in the model.

Usage

pfc(X, y, fy = NULL, numdir = NULL, structure = c("iso", "aniso",
    "unstr", "unstr2"), eps_aniso = 1e-3, numdir.test = FALSE, ...)

Arguments

X

Design matrix with n rows of observations and p columns of predictors. The predictors are assumed to have a continuous distribution.

y

The response vector of n observations, continuous or categorical.

fy

Basis function to be obtained using bf or defined by the user. It is a function of y alone and has r independent column vectors. See bf, for detail.

numdir

The number of directions to be used in estimating the reduction subspace. The dimension must be less than or equal to the minimum of r and p. By default numdir=min{r,p}.

structure

Structure of var(X|Y). The following options are available: "iso" for isotropic (predictors, conditionally on the response, are independent and on the same measurement scale); "aniso" for anisotropic (predictors, conditionally on the response, are independent and on different measurement scales); "unstr" for unstructured variance. The fourth structure "unstr2" refers to an extended PFC model with an heterogenous error structure.

eps_aniso

Precision term used in estimating var(X|Y) for the anisotropic structure.

numdir.test

Boolean. If FALSE, pfc fits with the numdir provided only. If TRUE, PFC models are fit for all dimensions less than or equal to numdir.

...

Additional arguments to Grassmannoptim.

Details

Let X be a column vector of p predictors, and Y be a univariate response variable. Principal fitted components model is an inverse regression model for sufficient dimension reduction. It is an inverse regression model given by X|(Y=y) sim N(μ + Γ β f_y, Δ). The term Δ is assumed independent of y. Its simplest structure is the isotropic (iso) with Δ=δ^2 I_p, where, conditionally on the response, the predictors are independent and are on the same measurement scale. The sufficient reduction is Γ^TX. The anisotropic (aniso) PFC model assumes that Δ=diag(δ_1^2, ..., δ_p^2), where the conditional predictors are independent and on different measurement scales. The unstructured (unstr) PFC model allows a general structure for Δ. With the anisotropic and unstructured Δ, the sufficient reduction is Γ^T Δ^{-1}X. it should be noted that X in R^{p} while the data-matrix to use is in R^{n \times p}.

The error structure of the extended structure has the following form

Δ=Γ Ω Γ^T + Γ_0 Ω_0 Γ_0^T,

where Γ_0 is the orthogonal completion of Γ such that (Γ, Γ_0) is a p \times p orthogonal matrix. The matrices Ω in R^{d \times d} and Ω_0 in R^{(p-d) \times (p-d)} are assumed to be symmetric and full-rank. The sufficient reduction is Γ^{T}X. Let mathcal{S}_{Γ} be the subspace spanned by the columns of Γ. The parameter space of mathcal{S}_{Γ} is the set of all d dimensional subspaces in R^p, called Grassmann manifold and denoted by mathcal{G}_{(d,p)}. Let hat{Σ}, hat{Σ}_{mathrm{fit}} be the sample variance of X and the fitted covariance matrix, and let hat{Σ}_{mathrm{res}}=hat{Σ} - hat{Σ}_{mathrm{fit}}. The MLE of mathcal{S}_{Γ} under unstr2 setup is obtained by maximizing the log-likelihood

L(mathcal{S}_U) = - log|U^T hat{Σ}_{mathrm{res}} U| - log|V^T hat{Σ}V|

over mathcal{G}_{(d,p)}, where V is an orthogonal completion of U.

The dimension d of the sufficient reduction must be estimated. A sequential likelihood ratio test is implemented as well as Akaike and Bayesian information criterion following Cook and Forzani (2008)

Value

This command returns a list object of class ldr. The output depends on the argument numdir.test. If numdir.test=TRUE, a list of matrices is provided corresponding to the numdir values (1 through numdir) for each of the parameters μ, β, Γ, Γ_0, Ω, and Ω_0. Otherwise, a single list of matrices for a single value of numdir. The outputs of loglik, aic, bic, numpar are vectors of numdir elements if numdir.test=TRUE, and scalars otherwise. Following are the components returned:

R

The reduction data-matrix of X obtained using the centered data-matrix X. The centering of the data-matrix of X is such that each column vector is centered around its sample mean.

Muhat

Estimate of μ.

Betahat

Estimate of β.

Deltahat

The estimate of the covariance Δ.

Gammahat

An estimated orthogonal basis representative of hat{mathcal{S}}_{Γ}, the subspace spanned by Γ.

Gammahat0

An estimated orthogonal basis representative of hat{mathcal{S}}_{Γ_0}, the subspace spanned by Γ_0.

Omegahat

The estimate of the covariance Ω if an extended model is used.

Omegahat0

The estimate of the covariance Ω_0 if an extended model is used.

loglik

The value of the log-likelihood for the model.

aic

Akaike information criterion value.

bic

Bayesian information criterion value.

numdir

The number of directions to estimate.

numpar

The number of parameters in the model.

evalues

The first numdir largest eigenvalues of hat{Σ}_{mathrm{fit}}.

Author(s)

Kofi Placid Adragni <kofi@umbc.edu>

References

Adragni, KP and Cook, RD (2009): Sufficient dimension reduction and prediction in regression. Phil. Trans. R. Soc. A 367, 4385-4405.

Cook, RD (2007): Fisher Lecture - Dimension Reduction in Regression (with discussion). Statistical Science, 22, 1–26.

Cook, RD and Forzani, L (2008): Principal fitted components for dimension reduction in regression. Statistical Science 23, 485–501.

See Also

core, lad

Examples

data(bigmac)
fit1 <- pfc(X=bigmac[,-1], y=bigmac[,1], fy=bf(y=bigmac[,1], case="poly",
        degree=3),numdir=3, structure="aniso")
summary(fit1)
plot(fit1)

fit2 <- pfc(X=bigmac[,-1], y=bigmac[,1], fy=bf(y=bigmac[,1], case="poly",
        degree=3), numdir=3, structure="aniso", numdir.test=TRUE)
summary(fit2)
	

Results