R: Estimate best number of Components for missing value...
kEstimateFast
R Documentation
Estimate best number of Components for missing value estimation
Description
This is a simple estimator for the optimal number of componets
when applying PCA or LLSimpute for missing value estimation. No
cross validation is performed, instead the estimation quality is
defined as Matrix[!missing] - Estimate[!missing]. This will give a
relatively rough estimate, but the number of iterations equals the
length of the parameter evalPcs. Does not work with LLSimpute!!
As error measure the NRMSEP (see Feten et. al, 2005) or the Q2
distance is used. The NRMSEP basically normalises the RMSD
between original data and estimate by the variable-wise
variance. The reason for this is that a higher variance will
generally lead to a higher estimation error. If the number of
samples is small, the gene - wise variance may become an unstable
criterion and the Q2 distance should be used instead. Also if
variance normalisation was applied previously.
matrix – numeric matrix containing
observations in rows and variables in columns
method
character – a valid pca method (see
pca).
evalPcs
numeric – The principal components to use
for cross validation or cluster sizes if used with
llsImpute. Should be an array containing integer values,
eg. evalPcs = 1:10 or evalPcs = C(2,5,8).The NRMSEP is calculated
for each component.
em
character – The error measure. This can be
nrmsep or q2
allVariables
boolean – If TRUE, the NRMSEP is
calculated for all variables, If FALSE, only the incomplete ones
are included. You maybe want to do this to compare several methods
on a complete data set.
verbose
boolean – If TRUE, the NRMSEP and the
variance are printed to the console each iteration.
...
Further arguments to pca
Value
list
Returns a list with the elements:
minNPcs - number of PCs for which the minimal average NRMSEP
was obtained
eError - an array of of size length(evalPcs). Contains the
estimation error for each number of
components.
evalPcs - The evaluated numbers of components or
cluster sizes (the same as the evalPcs input parameter).
Author(s)
Wolfram Stacklies
See Also
kEstimate.
Examples
data(metaboliteData)
# Estimate best number of PCs with ppca for component 2:4
esti <- kEstimateFast(t(metaboliteData), method = "ppca", evalPcs = 2:4, em="nrmsep")
barplot(drop(esti$eError), xlab = "Components",ylab = "NRMSEP (1 iterations)")
# The best k value is:
print(esti$minNPcs)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(pcaMethods)
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Attaching package: 'pcaMethods'
The following object is masked from 'package:stats':
loadings
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/pcaMethods/kEstimateFast.Rd_%03d_medium.png", width=480, height=480)
> ### Name: kEstimateFast
> ### Title: Estimate best number of Components for missing value estimation
> ### Aliases: kEstimateFast
> ### Keywords: multivariate
>
> ### ** Examples
>
> data(metaboliteData)
> # Estimate best number of PCs with ppca for component 2:4
> esti <- kEstimateFast(t(metaboliteData), method = "ppca", evalPcs = 2:4, em="nrmsep")
> barplot(drop(esti$eError), xlab = "Components",ylab = "NRMSEP (1 iterations)")
> # The best k value is:
> print(esti$minNPcs)
NULL
>
>
>
>
>
> dev.off()
null device
1
>