R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Principal fitted components

pfc	R Documentation

Principal fitted components

Description

Principal fitted components model for sufficient dimension reduction. This function estimates all parameters in the model.

Usage

pfc(X, y, fy = NULL, numdir = NULL, structure = c("iso", "aniso",
    "unstr", "unstr2"), eps_aniso = 1e-3, numdir.test = FALSE, ...)

Arguments

`X`	Design matrix with `n` rows of observations and `p` columns of predictors. The predictors are assumed to have a continuous distribution.
`y`	The response vector of `n` observations, continuous or categorical.
`fy`	Basis function to be obtained using `bf` or defined by the user. It is a function of `y` alone and has `r` independent column vectors. See `bf`, for detail.
`numdir`	The number of directions to be used in estimating the reduction subspace. The dimension must be less than or equal to the minimum of `r` and `p`. By default `numdir`=min{r,p}.
`structure`	Structure of `var(X\|Y)`. The following options are available: `"iso"` for isotropic (predictors, conditionally on the response, are independent and on the same measurement scale); `"aniso"` for anisotropic (predictors, conditionally on the response, are independent and on different measurement scales); `"unstr"` for unstructured variance. The fourth structure `"unstr2"` refers to an extended PFC model with an heterogenous error structure.
`eps_aniso`	Precision term used in estimating `var(X\|Y)` for the anisotropic structure.
`numdir.test`	Boolean. If `FALSE`, `pfc` fits with the `numdir` provided only. If `TRUE`, PFC models are fit for all dimensions less than or equal to `numdir`.
`...`	Additional arguments to `Grassmannoptim`.

Details

Let X be a column vector of p predictors, and Y be a univariate response variable. Principal fitted components model is an inverse regression model for sufficient dimension reduction. It is an inverse regression model given by X|(Y=y) sim N(μ + Γ β f_y, Δ). The term Δ is assumed independent of y. Its simplest structure is the isotropic (iso) with Δ=δ^2 I_p, where, conditionally on the response, the predictors are independent and are on the same measurement scale. The sufficient reduction is Γ^TX. The anisotropic (aniso) PFC model assumes that Δ=diag(δ_1^2, ..., δ_p^2), where the conditional predictors are independent and on different measurement scales. The unstructured (unstr) PFC model allows a general structure for Δ. With the anisotropic and unstructured Δ, the sufficient reduction is Γ^T Δ^{-1}X. it should be noted that X in R^{p} while the data-matrix to use is in R^{n \times p}.

The error structure of the extended structure has the following form

Δ=Γ Ω Γ^T + Γ_0 Ω_0 Γ_0^T,

where Γ_0 is the orthogonal completion of Γ such that (Γ, Γ_0) is a p \times p orthogonal matrix. The matrices Ω in R^{d \times d} and Ω_0 in R^{(p-d) \times (p-d)} are assumed to be symmetric and full-rank. The sufficient reduction is Γ^{T}X. Let mathcal{S}_{Γ} be the subspace spanned by the columns of Γ. The parameter space of mathcal{S}_{Γ} is the set of all d dimensional subspaces in R^p, called Grassmann manifold and denoted by mathcal{G}_{(d,p)}. Let hat{Σ}, hat{Σ}_{mathrm{fit}} be the sample variance of X and the fitted covariance matrix, and let hat{Σ}_{mathrm{res}}=hat{Σ} - hat{Σ}_{mathrm{fit}}. The MLE of mathcal{S}_{Γ} under unstr2 setup is obtained by maximizing the log-likelihood

L(mathcal{S}_U) = - log|U^T hat{Σ}_{mathrm{res}} U| - log|V^T hat{Σ}V|

over mathcal{G}_{(d,p)}, where V is an orthogonal completion of U.

The dimension d of the sufficient reduction must be estimated. A sequential likelihood ratio test is implemented as well as Akaike and Bayesian information criterion following Cook and Forzani (2008)

Value

This command returns a list object of class ldr. The output depends on the argument numdir.test. If numdir.test=TRUE, a list of matrices is provided corresponding to the numdir values (1 through numdir) for each of the parameters μ, β, Γ, Γ_0, Ω, and Ω_0. Otherwise, a single list of matrices for a single value of numdir. The outputs of loglik, aic, bic, numpar are vectors of numdir elements if numdir.test=TRUE, and scalars otherwise. Following are the components returned:

`R`	The reduction data-matrix of X obtained using the centered data-matrix X. The centering of the data-matrix of X is such that each column vector is centered around its sample mean.
`Muhat`	Estimate of μ.
`Betahat`	Estimate of β.
`Deltahat`	The estimate of the covariance Δ.
`Gammahat`	An estimated orthogonal basis representative of hat{mathcal{S}}_{Γ}, the subspace spanned by Γ.
`Gammahat0`	An estimated orthogonal basis representative of hat{mathcal{S}}_{Γ_0}, the subspace spanned by Γ_0.
`Omegahat`	The estimate of the covariance Ω if an extended model is used.
`Omegahat0`	The estimate of the covariance Ω_0 if an extended model is used.
`loglik`	The value of the log-likelihood for the model.
`aic`	Akaike information criterion value.
`bic`	Bayesian information criterion value.
`numdir`	The number of directions to estimate.
`numpar`	The number of parameters in the model.
`evalues`	The first `numdir` largest eigenvalues of hat{Σ}_{mathrm{fit}}.

Author(s)

Kofi Placid Adragni <kofi@umbc.edu>

References

Adragni, KP and Cook, RD (2009): Sufficient dimension reduction and prediction in regression. Phil. Trans. R. Soc. A 367, 4385-4405.

Cook, RD (2007): Fisher Lecture - Dimension Reduction in Regression (with discussion). Statistical Science, 22, 1–26.

Cook, RD and Forzani, L (2008): Principal fitted components for dimension reduction in regression. Statistical Science 23, 485–501.

Examples

data(bigmac)
fit1 <- pfc(X=bigmac[,-1], y=bigmac[,1], fy=bf(y=bigmac[,1], case="poly",
        degree=3),numdir=3, structure="aniso")
summary(fit1)
plot(fit1)

fit2 <- pfc(X=bigmac[,-1], y=bigmac[,1], fy=bf(y=bigmac[,1], case="poly",
        degree=3), numdir=3, structure="aniso", numdir.test=TRUE)
summary(fit2)