R Graphical Manual

Browse All

Last data update: 2014.03.03

R: PCA implementation based on robustSvd

robustPca

R Documentation

PCA implementation based on robustSvd

Description

This is a PCA implementation robust to outliers in a data set. It can also handle missing values, it is however NOT intended to be used for missing value estimation. As it is based on robustSVD we will get an accurate estimation for the loadings also for incomplete data or for data with outliers. The returned scores are, however, affected by the outliers as they are calculated inputData X loadings. This also implies that you should look at the returned R2/R2cum values with caution. If the data show missing values, scores are caluclated by just setting all NA - values to zero. This is not expected to produce accurate results. Please have also a look at the manual page for robustSvd. Thus this method should mainly be seen as an attempt to integrate robustSvd() into the framework of this package. Use one of the other methods coming with this package (like PPCA or BPCA) if you want to do missing value estimation. It is not recommended to use this function directely but rather to use the pca() wrapper function.

Usage

robustPca(Matrix, nPcs = 2, verbose = interactive(), ...)

Arguments

`Matrix`	`matrix` – Data containing the variables in columns and observations in rows. The data may contain missing values, denoted as `NA`.
`nPcs`	`numeric` – Number of components to estimate. The preciseness of the missing value estimation depends on the number of components, which should resemble the internal structure of the data.
`verbose`	`boolean` Print some output to the command line if TRUE
`...`	Reserved for future use. Currently no further parameters are used

Details

The method is very similar to the standard prcomp() function. The main difference is that robustSvd() is used instead of the conventional svd() method.

Value

Standard PCA result object used by all PCA-based methods of this package. Contains scores, loadings, data mean and more. See pcaRes for details. are used.

Author(s)

Wolfram Stacklies

Examples

## Load a complete sample metabolite data set and mean center the data
data(metaboliteDataComplete)
mdc <- scale(metaboliteDataComplete, center=TRUE, scale=FALSE)
## Now create 5% of outliers.
cond   <- runif(length(mdc)) < 0.05;
mdcOut <- mdc
mdcOut[cond] <- 10
## Now we do a conventional PCA and robustPca on the original and the data
## with outliers.
## We use center=FALSE here because the large artificial outliers would
## affect the means and not allow to objectively compare the results.
resSvd    <- pca(mdc, method="svd", nPcs=10, center=FALSE)
resSvdOut <- pca(mdcOut, method="svd", nPcs=10, center=FALSE)
resRobPca <- pca(mdcOut, method="robustPca", nPcs=10, center=FALSE)
## Now we plot the results for the original data against those with outliers
## We can see that robustPca is hardly effected by the outliers.
plot(loadings(resSvd)[,1], loadings(resSvdOut)[,1])
plot(loadings(resSvd)[,1], loadings(resRobPca)[,1])

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(pcaMethods)
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.


Attaching package: 'pcaMethods'

The following object is masked from 'package:stats':

    loadings

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/pcaMethods/robustPca.Rd_%03d_medium.png", width=480, height=480)
> ### Name: robustPca
> ### Title: PCA implementation based on robustSvd
> ### Aliases: robustPca
> ### Keywords: multivariate
> 
> ### ** Examples
> 
> ## Load a complete sample metabolite data set and mean center the data
> data(metaboliteDataComplete)
> mdc <- scale(metaboliteDataComplete, center=TRUE, scale=FALSE)
> ## Now create 5% of outliers.
> cond   <- runif(length(mdc)) < 0.05;
> mdcOut <- mdc
> mdcOut[cond] <- 10
> ## Now we do a conventional PCA and robustPca on the original and the data
> ## with outliers.
> ## We use center=FALSE here because the large artificial outliers would
> ## affect the means and not allow to objectively compare the results.
> resSvd    <- pca(mdc, method="svd", nPcs=10, center=FALSE)
> resSvdOut <- pca(mdcOut, method="svd", nPcs=10, center=FALSE)
> resRobPca <- pca(mdcOut, method="robustPca", nPcs=10, center=FALSE)
> ## Now we plot the results for the original data against those with outliers
> ## We can see that robustPca is hardly effected by the outliers.
> plot(loadings(resSvd)[,1], loadings(resSvdOut)[,1])
> plot(loadings(resSvd)[,1], loadings(resRobPca)[,1])
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>