Last data update: 2014.03.03

R: Extension of PCA to analyze multiple data sets
mbpcaR Documentation

Extension of PCA to analyze multiple data sets

Description

Three approaches are supplied in this function, consensus PCA (CPCA), generalized CCA (GCCA) and multiple co-inertia analsyis (MCIA).

Usage

mbpca(x, ncomp, method, k = "all", center = TRUE, scale = FALSE, 
  option = "uniform", maxiter = 1000, moa = TRUE, verbose = TRUE, 
  svd.solver = c("svd", "fast.svd", "propack"))

Arguments

x

A list of matrix or data.frame, where rows are variables and columns are samples. The columns among the matrices need to be match but the variables do not need to be.

ncomp

An integer; the number of components to calculate. To calculate more components requires longer computational time.

method

A character string could be one of c("globalScore", "blockScore", "blockLoading"). The "globalScore" approach equals consensus PCA; The "blockScore" approach equals generalized canonical correlation analysis (GCCA); The "blockLoading" approach equals multiple co-inertia anaysis (MCIA);

k

The absolute number (if k >= 1) or the proportion (if 0<k<1) of non-zero coefficients for the variable loading vectors. It could be a single value or a vector has the same length as x so the sparsity of individual matrix could be different.

center

Logical; if the variables should be centered

scale

Logical; if the variables should be scaled

option

A charater string could be one of c("lambda1", "inertia", "uniform") to indicate how the different matrices should be normalized. If "lambda1", the matrix is divided by its the first singular value, if "inertia", the matrix is divided by its total inertia (sum of square), if "uniform", none of them would be done.

maxiter

Integer; Maximum number of iterations in the algorithm

moa

Logical; whether the output should be converted to an object of class moa-class

verbose

Logical; whether the process (# of PC) should be printed

svd.solver

A charater string could be one of c("svd", "fast.svd", "propack"). The default "fast.svd " has a good compromise between the robustness and speed. "propack" is the fastest but may failed to converge in practice.

Details

details need to update

Value

An object of class moa-class (if moa=TRUE) or an list object contains the following elements:

tb - the block scores

pb - the block loadings

t - the global scores

w - the wegihts of block scores to construct the global scor

Note

no note now

Author(s)

Chen Meng

References

reference need to be updated

See Also

see moa for non-iterative algorithms for multi-block PCA.

Examples


data("NCI60_4arrays")
tumorType <- sapply(strsplit(colnames(NCI60_4arrays$agilent), split="\."), "[", 1)
colcode <- as.factor(tumorType)
levels(colcode) <- c("red", "green", "blue", "cyan", "orange", 
                     "gray25", "brown", "gray75", "pink")
colcode <- as.character(colcode)



moa <- mbpca(NCI60_4arrays, ncomp = 10, k = "all", method = "globalScore", option = "lambda1", 
             center=TRUE, scale=FALSE)
plot(moa, value="eig", type=2)
r <- bootMbpca(moa, mc.cores = 1, B=6, replace = FALSE, resample = "sample")

moas <- mbpca(NCI60_4arrays, ncomp = 3, k = 0.1, method = "globalScore", option = "lambda1", 
              center=TRUE, scale=FALSE)


scr <- moaScore(moa)
scrs <- moaScore(moas)
diag(cor(scr[, 1:3], scrs))

layout(matrix(1:2, 1, 2))
plot(scrs[, 1:2], col=colcode, pch=20)
legend("topright", legend = unique(tumorType), col=unique(colcode), pch=20)
plot(scrs[, 2:3], col=colcode, pch=20)

gap <- moGap(moas, K.max = 12, cluster = "hcl")
gap$nClust


hcl <- hclust(dist(scrs))
cls <- cutree(hcl, k=4)
clsColor <- as.factor(cls)
levels(clsColor) <- c("red", "blue", "orange", "pink")
clsColor <- as.character((clsColor))

heatmap(t(scrs[hcl$order, ]), ColSideColors = colcode[hcl$order], Rowv = NA, Colv=NA)
heatmap(t(scrs[hcl$order, ]), ColSideColors = clsColor[hcl$order], Rowv = NA, Colv=NA)

genes <- moaCoef(moas)
genes$nonZeroCoef$agilent.V1.neg


Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(mogsa)
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/mogsa/mbpca.Rd_%03d_medium.png", width=480, height=480)
> ### Name: mbpca
> ### Title: Extension of PCA to analyze multiple data sets
> ### Aliases: mbpca
> ### Keywords: multi-blcok PCA GCCA CPCA MCIA
> 
> ### ** Examples
> 
> 
> data("NCI60_4arrays")
> tumorType <- sapply(strsplit(colnames(NCI60_4arrays$agilent), split="\."), "[", 1)
> colcode <- as.factor(tumorType)
> levels(colcode) <- c("red", "green", "blue", "cyan", "orange", 
+                      "gray25", "brown", "gray75", "pink")
> colcode <- as.character(colcode)
> 
> 
> 
> moa <- mbpca(NCI60_4arrays, ncomp = 10, k = "all", method = "globalScore", option = "lambda1", 
+              center=TRUE, scale=FALSE)
calculating component 1 ...
calculating component 2 ...
calculating component 3 ...
calculating component 4 ...
calculating component 5 ...
calculating component 6 ...
calculating component 7 ...
calculating component 8 ...
calculating component 9 ...
calculating component 10 ...
> plot(moa, value="eig", type=2)
> r <- bootMbpca(moa, mc.cores = 1, B=6, replace = FALSE, resample = "sample")
method is set to 'globalScore'.
> 
> moas <- mbpca(NCI60_4arrays, ncomp = 3, k = 0.1, method = "globalScore", option = "lambda1", 
+               center=TRUE, scale=FALSE)
calculating component 1 ...
calculating component 2 ...
calculating component 3 ...
> 
> 
> scr <- moaScore(moa)
> scrs <- moaScore(moas)
> diag(cor(scr[, 1:3], scrs))
      PC1       PC2       PC3 
0.9741884 0.9889647 0.9546203 
> 
> layout(matrix(1:2, 1, 2))
> plot(scrs[, 1:2], col=colcode, pch=20)
> legend("topright", legend = unique(tumorType), col=unique(colcode), pch=20)
> plot(scrs[, 2:3], col=colcode, pch=20)
> 
> gap <- moGap(moas, K.max = 12, cluster = "hcl")
> gap$nClust
   firstSEmax Tibs2001SEmax   globalSEmax      firstmax     globalmax 
            4             4             7             4             7 
> 
> 
> hcl <- hclust(dist(scrs))
> cls <- cutree(hcl, k=4)
> clsColor <- as.factor(cls)
> levels(clsColor) <- c("red", "blue", "orange", "pink")
> clsColor <- as.character((clsColor))
> 
> heatmap(t(scrs[hcl$order, ]), ColSideColors = colcode[hcl$order], Rowv = NA, Colv=NA)
> heatmap(t(scrs[hcl$order, ]), ColSideColors = clsColor[hcl$order], Rowv = NA, Colv=NA)
> 
> genes <- moaCoef(moas)
> genes$nonZeroCoef$agilent.V1.neg
                           id         coef
FGD3_agilent     FGD3_agilent -0.105242702
TMC6_agilent     TMC6_agilent -0.045236957
GMFG_agilent     GMFG_agilent -0.042502839
IQGAP2_agilent IQGAP2_agilent -0.001185483
> 
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>