R: Measuring goodness-of-fit for principal objects.
Rc
R Documentation
Measuring goodness-of-fit for principal objects.
Description
These functions compute the ‘coverage coefficient’ R_c
for local principal curves, local principal points
(i.e., kernel density estimates obtained through iterated mean shift), and other principal objects.
Usage
Rc(x,...)
## S3 method for class 'lpc'
Rc(x,...)
## S3 method for class 'lpc.spline'
Rc(x,...)
## S3 method for class 'ms'
Rc(x,...)
base.Rc(data, closest.coords, type="curve")
Arguments
x
an object used to select a method.
...
Further arguments passed to or from other methods (not
needed yet).
data
A data matrix.
closest.coords
A matrix of coordinates of the projected data.
type
For principal curves, don't modify. For principal points,
set "points".
Details
Rc computes the coverage coefficient R_c, a quantity which
estimates the goodness-of-fit of a fitted principal object. This
quantity can be interpreted similar to the coeffient of determination in
regression analysis: Values close to 1 indicate a good fit, while values
close to 0 indicate a ‘bad’ fit (corresponding to linear PCA).
For objects of type lpc, lpc.spline, and ms, S3 methods are available which use the generic function
Rc. This, in turn, calls the base function base.Rc, which
can also be used manually if the fitted object is of another class.
In principle, function base.Rc can be used for assessing
goodness-of-fit of any principal object provided that
the coordinates (closest.coords) of the projected data are
available. For instance, for HS principal curves fitted via
princurve, this information is contained in component $s,
and for a a k-means object, say fitk, this information can be
obtained via fitk$centers[fitk$cluster,]. Set type="points" in
the latter case.
The function Rc attempts to compute all missing information, so
computation will take the longer the less informative the given
object x is. Note also, Rc looks up the option scaled in the fitted
object, and accounts for the scaling automatically. Important: If the data
were scaled, then do NOT unscale the results by hand in order to feed
the unscaled version into base.Rc, this will give a wrong result.
In terms of methodology, these functions compute R_c directly through the mean
reduction of absolute residual length, rather than through the
area above the coverage curve.
These functions do currently not account for observation
weights, i.e. R_c is computed through the unweighted mean
reduction in absolute residual length (even if weights have been used for
the curve fitting).
Acknowledgements
Contributions (in form of pieces of code, or useful suggestions for
improvements) by Jo Dwyer, Mohammad Zayed, and
Ben Oakley are gratefully acknowledged.
Author(s)
J. Einbeck and L. Evers.
References
Einbeck, Tutz, and Evers (2005). Local principal curves. Statistics and
Computing 15, 301-313.
Einbeck (2011). Bandwidth selection for nonparametric unsupervised
learning techniques – a unified approach via self-coverage. Journal of
Pattern Recognition Research 6, 175-192.
See Also
lpc.spline, codems, coverage.
Examples
data(calspeedflow)
lpc1 <- lpc.spline(lpc(calspeedflow[,3:4]), project=TRUE)
Rc(lpc1)
# is the same as:
base.Rc(lpc1$lpcobject$data, lpc1$closest.coords)
ms1 <- ms(calspeedflow[,3:4],plotms=0)
Rc(ms1)
# is the same as:
base.Rc(ms1$data, ms1$cluster.center[ms1$closest.label,], type="points")
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(LPCM)
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/LPCM/Rc.Rd_%03d_medium.png", width=480, height=480)
> ### Name: Rc
> ### Title: Measuring goodness-of-fit for principal objects.
> ### Aliases: Rc Rc.lpc Rc.lpc.spline Rc.ms base.Rc
> ### Keywords: multivariate
>
> ### ** Examples
>
> data(calspeedflow)
> lpc1 <- lpc.spline(lpc(calspeedflow[,3:4]), project=TRUE)
> Rc(lpc1)
[1] 0.6125074
> # is the same as:
> base.Rc(lpc1$lpcobject$data, lpc1$closest.coords)
[1] 0.6125074
>
> ms1 <- ms(calspeedflow[,3:4],plotms=0)
> Rc(ms1)
[1] 0.5794134
> # is the same as:
> base.Rc(ms1$data, ms1$cluster.center[ms1$closest.label,], type="points")
[1] 0.5794134
>
>
>
>
>
> dev.off()
null device
1
>