R Graphical Manual

Browse All

Last data update: 2014.03.03

R: CNV clustering Procedure

ClusProc

R Documentation

CNV clustering Procedure

Description

This function chooses the optimal number of clusters and provides the assignments of each individuals under the optimum clustering number.

Usage

  ClusProc(signal, N = 2:6,
    varSelection = c("PC1", "RAW", "PC.9", "MEAN"),
    threshold = 1e-05, itermax = 8, adjust = TRUE,
    thresMAF = 0.01, scale = FALSE, thresSil = 0.01)

Arguments

`signal`	The matrix of intensity measurements. The row names must be consistent with the Individual ID in fam file.
`N`	Number of clusters one wants to fit to the data. N needs to be larger than 1 and if it is 1, error will be returned. The default value 2,3,...,6 will be used if it is missing.
`varSelection`	Factor. For specifying how to handle the intensity values. It must take value on 'RAW', 'PC.9', 'PC1'and 'MEAN'. If the value is 'RAW', then the raw intensity value will be used. If it is 'PC.9', then the first several PCA scores which account for 90% of all the variance will be used. If the value is 'PC1', then the first PCA scores will be used. If the value is 'MEAN', the mean of all the probes will be used. The default method is 'PC1'.
`threshold`	Optional number of convergence threshold. The iteration stops if the absolute difference of log likelihood between successive iterations is less than it. The default threshold 1e-05 will be used if it's missing.
`itermax`	Optional. The iteration stops if the time of iteration is large than this value. The default number 8 will be used if it's missing.
`adjust`	Logicals, If TRUE (default), the result will be adjusted by the silhouette score. See details.
`thresMAF`	The minor allele frequency threshold.
`thresSil`	The abandon threshold. The individual whose silhouette score is smaller than this value will be abandoned.
`scale`	Logicals. If TRUE, the signal will be scale by using sample mean and sample variance by columns before further data-processing.

Details

adjustIf adjust is TRUE, the result will be adjusted by the silhouette score in the following criterion. For each individual, the silhouette scores are calculated for each group. The individual will assigned forcefully to the group which maximize the silhouette scores.

Value

It returns object of class 'clust'. 'clust' is a list containing following components:

`clusNum`	The optimal number of clusters among give parameter N.
`silWidth`	Silhouette related results.

Author(s)

Meiling Liu

Examples

# Fit the data under the given clustering numbers
clus.fit <- ClusProc(signal=signal,N=2:6,varSelection='PC.9')

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(PedCNV)
Loading required package: Rcpp
Loading required package: RcppArmadillo
Loading required package: ggplot2
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/PedCNV/ClusProc.Rd_%03d_medium.png", width=480, height=480)
> ### Name: ClusProc
> ### Title: CNV clustering Procedure
> ### Aliases: ClusProc
> 
> ### ** Examples
> 
> # Fit the data under the given clustering numbers
> clus.fit <- ClusProc(signal=signal,N=2:6,varSelection='PC.9')
The first 5 principal components are used.
The logliklihood for signal model is -1663.629 when clustering number is 2.
The logliklihood for signal model is -1477.954 when clustering number is 3.
The logliklihood for signal model is -1394.682 when clustering number is 4.
The logliklihood for signal model is -1338.013 when clustering number is 5.
The logliklihood for signal model is -1283.297 when clustering number is 6.
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>