matrix — Preprocessed data with the
variables in columns and observations in rows. The data may
contain missing values, denoted as NA
nPcs
numeric – Number of components to
estimate. The preciseness of the missing value estimation depends
on thenumber of components, which should resemble the internal
structure of the data.
maxSteps
numeric – Number of estimation
steps. Default is based on a generous rule of thumb.
unitsPerLayer
The network units, example: c(2,4,6) for two
input units 2feature units (principal components), one hidden
layer fornon-linearity and three output units (original amount
ofvariables).
functionsPerLayer
The function to apply at each layer
eg. c("linr", "tanh", "linr")
weightDecay
Value between 0 and 1.
weights
Starting weights for the network. Defaults to
uniform random values but can be set specifically to make
algorithm deterministic.
verbose
boolean – nlpca prints the number of steps
and warning messages if set to TRUE. Default is interactive().
...
Reserved for future use. Not passed on anywhere.
Details
Artificial Neural Network (MLP) for performing non-linear
PCA. Non-linear PCA is conceptually similar to classical PCA but
theoretically quite different. Instead of simply decomposing our
matrix (X) to scores (T) loadings (P) and an error (E) we train a
neural network (our loadings) to find a curve through the
multidimensional space of X that describes a much variance as
possible. Classical ways of interpreting PCA results are thus not
applicable to NLPCA since the loadings are hidden in the network.
However, the scores of components that lead to low
cross-validation errors can still be interpreted via the score
plot. Unfortunately this method depend on slow iterations which
currently are implemented in R only making this method extremely
slow. Furthermore, the algorithm does not by itself decide when it
has converged but simply does 'maxSteps' iterations.
Value
Standard PCA result object used by all PCA-basedmethods of
this package. Contains scores, loadings, data meanand more. See
pcaRes for details.
Author(s)
Based on a matlab script by Matthias Scholz and ported to
R by Henning Redestig
References
Matthias Scholz, Fatma Kaplan, Charles L Guy, Joachim
Kopkaand Joachim Selbig. Non-linear PCA: a missing
data approach. Bioinformatics, 21(20):3887-3895, Oct 2005
Examples
## Data set with three variables where data points constitute a helix
data(helix)
helixNA <- helix
## not a single complete observation
helixNA <- t(apply(helix, 1, function(x) { x[sample(1:3, 1)] <- NA; x}))
## 50 steps is not enough, for good estimation use 1000
helixNlPca <- pca(helixNA, nPcs=1, method="nlpca", maxSteps=50)
fittedData <- fitted(helixNlPca, helixNA)
plot(fittedData[which(is.na(helixNA))], helix[which(is.na(helixNA))])
## compared to solution by Nipals PCA which cannot extract non-linear patterns
helixNipPca <- pca(helixNA, nPcs=2)
fittedData <- fitted(helixNipPca)
plot(fittedData[which(is.na(helixNA))], helix[which(is.na(helixNA))])
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(pcaMethods)
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Attaching package: 'pcaMethods'
The following object is masked from 'package:stats':
loadings
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/pcaMethods/nlpca.Rd_%03d_medium.png", width=480, height=480)
> ### Name: nlpca
> ### Title: Non-linear PCA
> ### Aliases: nlpca
>
> ### ** Examples
>
> ## Data set with three variables where data points constitute a helix
> data(helix)
> helixNA <- helix
> ## not a single complete observation
> helixNA <- t(apply(helix, 1, function(x) { x[sample(1:3, 1)] <- NA; x}))
> ## 50 steps is not enough, for good estimation use 1000
> helixNlPca <- pca(helixNA, nPcs=1, method="nlpca", maxSteps=50)
> fittedData <- fitted(helixNlPca, helixNA)
> plot(fittedData[which(is.na(helixNA))], helix[which(is.na(helixNA))])
> ## compared to solution by Nipals PCA which cannot extract non-linear patterns
> helixNipPca <- pca(helixNA, nPcs=2)
> fittedData <- fitted(helixNipPca)
> plot(fittedData[which(is.na(helixNA))], helix[which(is.na(helixNA))])
>
>
>
>
>
> dev.off()
null device
1
>