a numeric value between 0.5 and 1. For the robust
estimate of LGA, specifying the percentage of points in the best subset.
biter
an integer for the number of different starting
hyperplanes to try.
niter
an integer for the number of iterations to attempt for
convergence.
showall
logical. If TRUE then display all the outcomes, not just
the best one.
scale
logical. Allows you to scale the data, dividing each
column by its standard deviation, before fitting.
nnode
an integer of many CPUS to use for parallel
processing. Defaults to NULL i.e. no parallel processing.
silent
logical. If TRUE, produces no text output during
processing.
...
For any other arguments passed from the generic function.
Details
This code tries to find k clusters using the lga algorithm described
in Van Aelst et al (2006). For each attempt, it has up to
niter steps to get to convergence, and it does this from
biter different starting hyperplanes. It then selects the
clustering with the smallest Residual Orthoganal Sum of Squareds.
If biter is left as NULL, then it is selected via the equation
given in Van Aeslt et al (2006).
The function rlga is the robust equivalent to LGA, and is
introduced in Garcia-Escudero et al (2008).
Both functions are parallel computing aware via the nnode
argument, and works with the package snow. In order to use
parallel computing, one of MPI (e.g. lamboot) or PVM is necessary.
For further details, see the documentation for snow.
Associated with the lga and rlga functions are a print method and a
plot method (see the examples). In the plot method, the fitted
hyperplanes are also shown as dashed-lines when there are only two
dimensions.
Value
An object of class ‘“lga”’. The list contains
cluster
a vector containing the cluster memberships.
ROSS
the Residual Orthogonal Sum of Squares for the solution.
converged
a logical. True if at least one solution has converged.
nconverg
the number of converged solutions (out of biter starts).
Van Aelst, S. and Wang, X. and Zamar, R. and Zhu, R. (2006)
‘Linear Grouping Using Orthogonal Regression’,
Computational Statistics & Data Analysis50, 1287–1312.
Garcia-Escudero, L.A., Gordaliza, A., San Martin, R., Van Aelst, S. and
Zamar, R.H. (2008) ‘Robust linear clustering’. To appear in
Journal of the Royal Statistical Society, Series B (accepted
June, 2008).
See Also
gap
Examples
## Synthetic Data
## Make a dataset with 2 clusters in 2 dimensions
library(MASS)
set.seed(1234)
X <- rbind(mvrnorm(n=100, mu=c(1,-1), Sigma=diag(0.1,2)+0.9),
mvrnorm(n=100, mu=c(1,1), Sigma=diag(0.1,2)+0.9))
lgaout <- lga(X,2)
plot(lgaout)
print(lgaout)
## Robust equivalent
rlgaout <- rlga(X,2, alpha=0.75)
plot(rlgaout)
print(rlgaout)
## nhl94 data set
data(nhl94)
plot(lga(nhl94, k=3, niter=30))
## Allometry data set
data(brain)
plot(lga(log(brain, base=10), k=3))
## Second Allometry data set
data(ob)
plot(lga(log(ob[,2:3]), k=3), pch=as.character(ob[,1]))
## Corridor Walls data set
## To obtain the results reported in Garcia-Escudero et al. (2008):
data(corridorWalls)
rlgaout <- rlga(corridorWalls, k=3, biter = 100, niter = 30, alpha=0.85)
pairs(corridorWalls, col=rlgaout$cluster+1)
plot(rlgaout)
## Parallel processing case
## In this example, running using 4 nodes.
## Not run:
set.seed(1234)
X <- rbind(mvrnorm(n=1e6, mu=c(1,-1), Sigma=diag(0.1,2)+0.9),
mvrnorm(n=1e6, mu=c(1,1), Sigma=diag(0.1,2)+0.9))
abc <- lga(X, k=2, nnode=4)
## End(Not run)