Last data update: 2014.03.03

R: Computing Gap statistics to identify the optimal number of...
compGapStatsR Documentation

Computing Gap statistics to identify the optimal number of subtypes

Description

Compute Gap statistics to identify the optimal number of subtypes

Usage

compGapStats(ge.CRC, ntops=c(2, 4, 8, 12, 16, 20)*1000, K.max=6, nboot=100)
figGAP(gapsmat, gapsSE)

Arguments

ge.CRC

a numeric matrix of expression data of genes expressed in at least one sample.

ntops

an integer vector of top variable genes, measured by MAD (median absolute deviation).

K.max

an integer value specifying the maximal number of clusters to compute GAP statistics.

nboot

an integer value specifying the number of bootstraps, which is an argument B of function clusGap.

gapsmat

a numeric matrix of GAP statistics.

gapsSE

standard errors of means of the GAP statistics.

Details

GAP statistic is a popular method to estimate the number of clusters in a set of data by comparing the change in observed and expected within-cluster dispersion. To identify the optimal number of clusters, GAP statistic can be computed for k=1 to K.max with nboot bootstraps for ntops top variable genes in the AMC data set.

The function figGAP is designed to visualize GAP curves.

Value

This function will return a list including gapsmat (a numeric matrix of GAP statistics) and gapsSE (standard errors of means of the GAP statistics).

Author(s)

Xin Wang xw264@cam.ac.uk

References

De Sousa E Melo, F. and Wang, X. and Jansen, M. et al. Poor prognosis colon cancer is defined by a molecularly distinct subtype and precursor lesion. accepted

Tibshirani, Robert andWalther, Guenther and Hastie, Trevor (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(2), 411-423.

Examples

data(ge.CRC, package="DeSousa2013")
ge.CRC <- ge.all[selPbs, ]
gaps <- compGapStats(ge.CRC, ntops=c(2, 4)*1000, K.max=6, nboot=10)
figGAP(gaps$gapsmat, gaps$gapsSE)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(DeSousa2013)

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/DeSousa2013/compGapStats.Rd_%03d_medium.png", width=480, height=480)
> ### Name: compGapStats
> ### Title: Computing Gap statistics to identify the optimal number of
> ###   subtypes
> ### Aliases: compGapStats figGAP
> 
> ### ** Examples
> 
> data(ge.CRC, package="DeSousa2013")
> ge.CRC <- ge.all[selPbs, ]
> gaps <- compGapStats(ge.CRC, ntops=c(2, 4)*1000, K.max=6, nboot=10)
> figGAP(gaps$gapsmat, gaps$gapsSE)
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>