a numeric matrix or data frame which provides the data for the
principal components analysis.
k
desired number of components to compute
method
scale estimator used to detect the direction with the largest
variance. Possible values are "sd", "mad" and "qn", the
latter can be called "Qn" too. "mad" is the default value.
CalcMethod
the variant of the algorithm to be used. Possible values are
"eachobs", "lincomb" and "sphere", with "eachobs" being
the default.
nmax
maximum number of directions to search in each step (only when
using "sphere" or "lincomb" as the CalcMethod).
update
a logical value indicating whether an update algorithm should be
used.
scores
a logical value indicating whether the scores of the
principal component should be calculated.
maxit
maximim number of iterations.
maxhalf
maximum number of steps for angle halving.
scale
this argument indicates how the data is to be rescaled. It
can be a function like sd or mad or a vector
of length ncol(x) containing the scale value of each column.
center
this argument indicates how the data is to be centered. It
can be a function like mean or median or a vector
of length ncol(x) containing the center value of each column.
zero.tol
the zero tolerance used internally for checking
convergence, etc.
control
a list which elements must be the same as (or a subset of)
the parameters above. If the control object is supplied, the parameters from
it will be used and any other given parameters are overridden.
Details
Basically, this algrithm considers the directions of each observation
through the origin of the centered data as possible projection directions.
As this algorithm has some drawbacks, especially if ncol(x) > nrow(x)
in the data matrix, there are several improvements that can be used with this
algorithm.
updateAn updating step basing on the algorithm for finding the
eigenvectors is added to the algorithm. This can be used with any
CalcMethod
sphereAdditional search directions are added using random directions.
The random directions are determined using random data points generated from
a p-dimensional multivariate standard normal distribution. These new data
points are projected to the unit sphere, giving the new search directions.
lincombAdditional search directions are added using linear
combinations of the observations. It is similar to the
"sphere"-algorithm, but the new data points are generated using linear
combinations of the original data b_1*x_1 + ... + b_n*x_n where the
coefficients b_i come from a uniform distribution in the interval
[0, 1].
Similar to the function princomp, there is a print method
for the these objects that prints the results in a nice format and the plot
method produces a scree plot (screeplot). There is also a
biplot method.
Value
The function returns a list of class "princomp", i.e. a list similar to the
output of the function princomp.
sdev
the (robust) standard deviations of the principal components.
loadings
the matrix of variable loadings (i.e., a matrix whose columns
contain the eigenvectors). This is of class "loadings":
see loadings for its print method.
center
the means that were subtracted.
scale
the scalings applied to each variable.
n.obs
the number of observations.
scores
if scores = TRUE, the scores of the supplied data on the
principal components.
C. Croux, P. Filzmoser, M. Oliveira, (2007).
Algorithms for Projection-Pursuit Robust Principal Component Analysis,
Chemometrics and Intelligent Laboratory Systems, Vol. 87, pp. 218-225.
See Also
PCAgrid, ScaleAdv, princomp
Examples
# multivariate data with outliers
library(mvtnorm)
x <- rbind(rmvnorm(200, rep(0, 6), diag(c(5, rep(1,5)))),
rmvnorm( 15, c(0, rep(20, 5)), diag(rep(1, 6))))
# Here we calculate the principal components with PCAgrid
pc <- PCAproj(x, 6)
# we could draw a biplot too:
biplot(pc)
# we could use another calculation method and another objective function, and
# maybe only calculate the first three principal components:
pc <- PCAproj(x, 3, "qn", "sphere")
biplot(pc)
# now we want to compare the results with the non-robust principal components
pc <- princomp(x)
# again, a biplot for comparision:
biplot(pc)