R: (Sparse) Robust Principal Components using the Grid search...
PCAgrid
R Documentation
(Sparse) Robust Principal Components using the Grid search algorithm
Description
Computes a desired number of (sparse) (robust) principal components using
the grid search algorithm in the plane.
The global optimum of the objective function is searched in planes, not in
the p-dimensional space, using regular grids in these planes.
a numerical matrix or data frame of dimension (n x p)which
provides the data for the principal components analysis.
k
the desired number of components to compute
method
the scale estimator used to detect the direction with the
largest variance. Possible values are "sd", "mad" and
"qn", the latter can be called "Qn" too. "mad" is the
default value.
lambda
the sparseness constraint's strength(sPCAgrid only).
A single value for all components, or a vector of length k with
different values for each component can be specified.
See opt.TPO for the choice of this argument.
maxiter
the maximum number of iterations.
splitcircle
the number of directions in which the algorithm should
search for the largest variance. The direction with the largest variance
is searched for in the directions defined by a number of equally spaced points
on the unit circle. This argument determines, how many such points are used to
split the unit circle.
scores
A logical value indicating whether the scores of the
principal component should be calculated.
zero.tol
the zero tolerance used internally for checking
convergence, etc.
center
this argument indicates how the data is to be centered. It
can be a function like mean or median or a vector
of length ncol(x) containing the center value of each column.
scale
this argument indicates how the data is to be rescaled. It
can be a function like sd or mad or a vector
of length ncol(x) containing the scale value of each column.
trace
an integer value >= 0, specifying the tracing level.
store.call
a logical variable, specifying whether the function call
shall be stored in the result structure.
control
a list which elements must be the same as (or a subset of)
the parameters above. If the control object is supplied, the parameters from
it will be used and any other given parameters are overridden.
...
further arguments passed to or from other functions.
Details
In contrast to PCAgrid, the function sPCAgrid computes sparse
principal components. The strength of the applied sparseness constraint is
specified by argument lambda.
Similar to the function princomp, there is a print method
for the these objects that prints the results in a nice format and the
plot method produces a scree plot (screeplot). There is
also a biplot method.
Angle halving is an extension of the original algorithm. In the original
algorithm, the search directions are determined by a number of points on the
unit circle in the interval [-pi/2 ; pi/2). Angle halving means this angle is
halved in each iteration, eg. for the first approximation, the above mentioned
angle is used, for the second approximation, the angle is halved to
[-pi/4 ; pi/4) and so on. This usually gives better results with less
iterations needed.
NOTE: in previous implementations angle halving could be suppressed by the
former argument "anglehalving". This still can be done by setting
argument maxiter = 0.
Value
The function returns an object of class "princomp", i.e. a list
similar to the output of the function princomp.
sdev
the (robust) standard deviations of the principal components.
loadings
the matrix of variable loadings (i.e., a matrix whose columns
contain the eigenvectors). This is of class "loadings":
see loadings for its print method.
center
the means that were subtracted.
scale
the scalings applied to each variable.
n.obs
the number of observations.
scores
if scores = TRUE, the scores of the supplied data on the
principal components.
call
the matched call.
obj
A vector containing the objective functions values. For function
PCAgrid this is the same as sdev.
lambda
The lambda each component has been calculated with
(sPCAgrid only).
Note
See the vignette "Compiling pcaPP for Matlab" which comes with this package to compile and use these functions in Matlab.
C. Croux, P. Filzmoser, M. Oliveira, (2007).
Algorithms for Projection-Pursuit Robust Principal Component Analysis,
Chemometrics and Intelligent Laboratory Systems, Vol. 87, pp. 218-225.
C. Croux, P. Filzmoser, H. Fritz (2011).
Robust Sparse Principal Component Analysis Based on Projection-Pursuit,
?? To appear.
See Also
PCAproj, princomp
Examples
# multivariate data with outliers
library(mvtnorm)
x <- rbind(rmvnorm(200, rep(0, 6), diag(c(5, rep(1,5)))),
rmvnorm( 15, c(0, rep(20, 5)), diag(rep(1, 6))))
# Here we calculate the principal components with PCAgrid
pc <- PCAgrid(x)
# we could draw a biplot too:
biplot(pc)
# now we want to compare the results with the non-robust principal components
pc <- princomp(x)
# again, a biplot for comparison:
biplot(pc)
## Sparse loadings
set.seed (0)
x <- data.Zou ()
## applying PCA
pc <- princomp (x)
## the corresponding non-sparse loadings
unclass (pc$load[,1:3])
pc$sdev[1:3]
## lambda as calculated in the opt.TPO - example
lambda <- c (0.23, 0.34, 0.005)
## applying sparse PCA
spc <- sPCAgrid (x, k = 3, lambda = lambda, method = "sd")
unclass (spc$load)
spc$sdev[1:3]
## comparing the non-sparse and sparse biplot
par (mfrow = 1:2)
biplot (pc, main = "non-sparse PCs")
biplot (spc, main = "sparse PCs")