Performs functional principal component analysis of probability densities, in order to describe a data table consisting of T groups of individuals on which are observed p variables. It returns an object of class fpcad.
data frame with (p+1) columns. The first p columns are numeric. The last column is a factor with T levels defining T groups. Each group, say t, consists of n_t individuals.
gaussiand
logical. If TRUE (default), the probability densities are supposed Gaussian. If FALSE, densities are estimated using the Gaussian kernel method.
kern
string. If gaussiand = FALSE (default is TRUE), this argument sets the kernel used in the estimation method. Currently, only the Gaussian kernel is available: the settings kern = "gauss" and kern = NULL are equivalent.
windowh
either a list of T bandwidths (one per density associated to a group), or a strictly positive number. If windowh=NULL (default), the bandwidths are automatically computed. See Details.
normed
logical. If TRUE (default), the densities are normed before computing the distances.
centered
logical. If TRUE (default is FALSE), the densities are centered.
data.centered
logical. If TRUE (default is FALSE), the data of each group are centered.
data.scaled
logical. If TRUE (default is FALSE), the data of each group are centered (even if data.centered=FALSE) and scaled.
common.variance
logical. If TRUE (default is FALSE), a common covariance matrix (or correlation matrix if data.scaled=TRUE), computed on the whole data, is used. If FALSE (default), a covariance (or correlation) matrix per group is used.
nb.factors
numeric. Number of returned principal scores (default nb.factors=3).
Warning: The plot.fpcad and interpret.fpcad functions cannot take into account more than nb.factors principal factors.
nb.values
numerical. Number of returned eigenvalues (default nb.values=10).
sub.title
string. Subtitle for the graphs (default NULL).
plot.eigen
logical. If TRUE (default), the barplot of the eigenvalues is plotted.
plot.score
logical. If TRUE, the graphs of principal scores are plotted. A new graphic device is opened for each pair of principal scores defined by nscore argument.
nscore
numeric vector. If plot.score=TRUE, the numbers of the principal scores which are plotted. By default it is equal to nscore=1:3. Its components cannot be greater than nb.factors.
filename
string. Name of the file in which the results are saved. By default (filename = NULL) the results are not saved.
Details
The T probability densities f_t corresponding to the T groups of individuals are either parametrically estimated (gaussiand=TRUE) or estimated using the Gaussian kernel method (gaussiand=FALSE). In the latter case, the windowh argument provides the list of the bandwidths to use. Notice that in the multivariate case (p>1) the bandwidths are positive-definite matrices.
If windowh is a numerical value, the matrix bandwidth is of the form h S, where S is either the square root of the covariance matrix (p>1) or the standard deviation of the estimated density.
If windowh = NULL (default), h in the above formula is computed using the bandwidth.parameter function.
Value
Returns an object of class fpcad, that is a list including:
inertia
data frame of the eigenvalues and percentages of inertia.
contributions
data frame of the contributions to the first nb.factors principal components.
qualities
data frame of the qualities on the first nb.factors principal factors.
scores
data frame of the first nb.factors principal scores.
norm
vector of the L^2 norms of the densities.
means
list of the means.
variances
list of the covariance matrices.
correlations
list of the correlation matrices.
skewness
list of the skewness coefficients.
kurtosis
list of the kurtosis coefficients.
Author(s)
Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Sabine Demotes-Mainard.
References
Boumaza, R. (1998). Analyse en composantes principales de distributions gaussiennes multidimensionnelles. Revue de Statistique Appliqu<c3><a9>e, XLVI (2), 5-20.
Boumaza, R., Yousfi, S., Demotes-Mainard, S. (2015). Interpreting the principal component analysis of multivariate density functions. Communications in Statistics - Theory and Methods, 44 (16), 3321-3339.
Delicado, P. (2011). Dimensionality reduction when data are density functions. Computational Statistics & Data Analysis, 55, 401-420.
Yousfi, S., Boumaza, R., Aissani, D., Adjabi, S. (2014). Optimal bandwith matrices in functional principal component analysis of density functions. Journal of Statistical Computation and Simulation, 85 (11), 2315-2330.
data(roses)
# Case of a normed non-centred PCA of Gaussian densities (on 3 architectural
# characteristics of roses: shape (Sha), foliage density (Den) and symmetry (Sym))
result3 = fpcad(roses[,c("Sha","Den","Sym","Lot")])
print(result3)
plot(result3)