Last data update: 2014.03.03

R: Multidimensional scaling of probability densities
fmdsdR Documentation

Multidimensional scaling of probability densities

Description

Applies the multidimensional scaling (MDS) method to probability densities, in order to describe a three-way data table, consisting of T groups of individuals on which are observed p variables. It returns an object of class fmdsd. It applies cmdscale to the distance matrix between the T densities.

Usage

fmdsd(x, gaussiand = TRUE, windowh = NULL, kern = NULL, normed=FALSE,
    data.centered = FALSE, data.scaled = FALSE, common.variance = FALSE,
    nb.factors = 3, nb.values = 10, sub.title = "", plot.eigen = TRUE,
    plot.score = FALSE, nscore = 1:3, filename = NULL)

Arguments

x

data frame with (p+1) columns. The first p columns are numeric. The last column is a factor with T levels defining T groups. Each group, say t, consists of n_t individuals.

gaussiand

logical. If TRUE (default), the probability densities are supposed Gaussian. If FALSE, densities are estimated using the Gaussian kernel method.

kern

string. If gaussiand = FALSE, this argument sets the kernel used in the estimation method. Currently, only the Gaussian kernel is available: the settings kern = "gauss" and kern = NULL are equivalent.

windowh

either a list of T bandwidths (one per density associated to a group), or a strictly positive number. If windowh=NULL (default), the bandwidths are automatically computed. See Details.

normed

logical. If TRUE (default), the densities are normed before computing the distances.

data.centered

logical. If TRUE (default is FALSE), the data of each group are centered.

data.scaled

logical. If TRUE (default is FALSE), the data of each group are centered (even if data.centered=FALSE) and scaled.

common.variance

logical. If TRUE (default is FALSE), a common covariance matrix (or correlation matrix if data.scaled=TRUE), computed on the whole data, is used. If FALSE (default), a covariance (or correlation) matrix per group is used.

nb.factors

numeric. Number of returned principal coordinates (default nb.factors=3).

Warning: The plot.fmdsd and interpret.fmdsd functions cannot take into account more than nb.factors principal factors.

nb.values

numeric. Number of returned eigenvalues (default nb.values=10).

sub.title

string. Subtitle for the graphs (default NULL).

plot.eigen

logical. If TRUE (default), the barplot of the eigenvalues is plotted.

plot.score

logical. If TRUE, the graphs of new coordinates are plotted. A new graphic device is opened for each pair of coordinates defined by nscore argument.

nscore

numeric vector. If plot.score=TRUE, the numbers of the principal coordinates which are plotted. By default it is equal to nscore=1:3. Its components cannot be greater than nb.factors.

filename

string. Name of the file in which the results are saved. By default (filename = NULL) they are not saved.

Details

The T probability densities f_t corresponding to the T groups of individuals are either parametrically estimated (gaussiand=TRUE) or estimated using the Gaussian kernel method (gaussiand=FALSE). In the latter case, the windowh argument provides the list of the bandwidths to be used. Notice that in the multivariate case (p>1), the bandwidths are positive-definite matrices.

If windowh is a numerical value, the matrix bandwidth is of the form h S, where S is either the square root of the covariance matrix (p>1) or the standard deviation of the estimated density.

If windowh = NULL (default), h in the above formula is computed using the bandwidth.parameter function.

Value

Returns an object of class fmdsd, i.e. a list including:

inertia

data frame of the eigenvalues and percentages of inertia.

scores

data frame of the coordinates along the nb.factors first principal components.

norm

vector of the L^2 norms of the densities.

means

list of the means.

variances

list of the covariance matrices.

correlations

list of the correlation matrices.

skewness

list of the skewness coefficients.

kurtosis

list of the kurtosis coefficients.

Author(s)

Rachid Boumaza, Pierre Santagostini, Smail Yousfi, Sabine Demotes-Mainard.

References

Boumaza, R. (1998). Analyse en composantes principales de distributions gaussiennes multidimensionnelles. Revue de Statistique Appliquee, XLVI (2), 5-20.

Boumaza, R., Yousfi, S., Demotes-Mainard, S. (2015). Interpreting the principal component analysis of multivariate density functions. Communications in Statistics - Theory and Methods, 44 (16), 3321-3339.

Delicado, P. (2011). Dimensionality reduction when data are density functions. Computational Statistics & Data Analysis, 55, 401-420.

Yousfi, S., Boumaza, R., Aissani, D., Adjabi, S. (2014). Optimal bandwith matrices in functional principal component analysis of density function. Journal of Statistical Computation and Simulation, 85 (11), 2315-2330.

See Also

fpcad print.fmdsd, plot.fmdsd, interpret.fmdsd, bandwidth.parameter

Examples

# MDS on Gaussian densities (on sensory data)
data(roses)
result1 = fmdsd(roses)
print(result1)
plot(result1)

Results