Last data update: 2014.03.03

R: Clustering of Cell-events in the immunoClust-pipeline
cell.processR Documentation

Clustering of Cell-events in the immunoClust-pipeline

Description

This function performs iterative model based clustering on cell-event data. It takes the observed cell-event data as major input and returns an object of class immunoClust, which contains the fitted mixture model parameter and cluster membership information. The additional arguments control the routines for data preprocessing, major loop and EMt-iteration, the model refinement routine and transformation estimation.

Usage

cell.process(fcs, parameters=NULL, 
    apply.compensation=FALSE, classify.all=FALSE, 
    N=NULL, min.count=10, max.count=10, min=NULL, max=NULL,  
    I.buildup=6, I.final=4, I.trans=I.buildup, 
    modelName="mvt", tol=1e-5, bias=0.3,
    sub.tol= 1e-4, sub.bias=bias, sub.thres=bias, sub.samples=1500, 
    sub.extract=0.8, sub.weights=1, sub.standardize=TRUE,
    trans.estimate=TRUE, trans.minclust=10, 
    trans.a=0.01, trans.b=0.0, trans.parameters=NULL)

cell.MajorIterationLoop(dat, x=NULL, parameters=NULL, 
    I.buildup=6, I.final=4, 
    modelName="mvt", tol=1e-5, bias=0.3,
    sub.bias=bias, sub.thres=0.0, sub.tol=1e-4, sub.samples=1500, 
    sub.extract=0.8, sub.weights=1, sub.EM="MEt", sub.standardize=TRUE, seed=1)

cell.MajorIterationTrans(fcs, x=NULL, parameters=NULL, 
    I.buildup=6, I.final=4, I.trans=I.buildup, 
    modelName="mvt", tol=1e-5, bias=0.3,
    sub.bias=bias, sub.thres=0.0, sub.tol=1e-4, sub.samples=1500, 
    sub.extract=0.8, sub.weights=1, sub.EM="MEt", sub.standardize=TRUE, seed=1, 
    trans.minclust=5, trans.a=0.01, trans.decade=-1, trans.scale=1.0, 
    trans.proc="vsHtransAw")

cell.InitialModel(dat, parameters=NULL, trans.a = 0.01, trans.b = 0.0, 
    trans.decade=-1, trans.scale=1.0)

cell.classifyAll(fcs, x, apply.compensation=FALSE)                         

Arguments

fcs

An object of class flowFrame. Rows correspond to observations and columns correspond to measured parameters.

dat

A numeric matrix, data frame of observations, or object of class flowFrame. Rows correspond to observations and columns correspond to measured parameters.

x

An object of class immunoClust. Used as inital model int the major iteration loop. When left unspecified the simplest model containing 1 cluster is used as initial model.

Arguments for data pre and post processing:

parameters

A character vector specifying the parameters (columns) to be included in clustering. When it is left unspecified, all the parameters will be used.

apply.compensation

A numeric indicator whether the compensation matrix in the flowFrame should be applied.

classify.all

A numeric indicator whether the removed over- and underexposed observations should also be classified after the clustering process.

N

Maximum number of observations used for clustering. When unspecified or higher than the number of observations (i.e. rows) in dat, all observations are used for clustering, otherwise only the first N observations.

min.count

An integer specifying the threshold count for filtering data points from below. The default is 10, meaning that if 10 or more data points are smaller than or equal to min, they will be excluded from the analysis. If min is NULL, then the minimum value of each parameter will be used. To suppress filtering, it is set to -1.

max.count

An integer specifying the threshold count for filtering data points from above. Interpretation is similar to that of min.count.

min

The lower limit set for data filtering. Note that it is a vector of length equal to the number of parameters (columns), implying that a different value can be set for each parameter.

max

The upper limit set for data filtering. Interpretation is similar to that of min.

Arguments for the major loop and EMt-iteration:

I.buildup

The number of major iterations, where the number of used observations is doubled successively.

I.final

The number of major iterations with all observations.

I.trans

The number of iterations where transformation estimation is applied.

modelName

Used mixture model; either "mvt" for a t-mixture model or "mvn" for a Gaussian Mixture model.

tol

The tolerance used to assess the convergence of the major EM(t)-algorithms of all observations.

bias

The ICL-bias used in the major EMt-algorithms of all observations.

Arguments for model refinement (sub-clustering):

sub.tol

The tolerance used to assess the convergence of the EM-algorithms in the sub-clustering.

sub.bias

The ICL-bias used in the sub-clustering EMt-algorithms, in general the same as the ICL-bias.

sub.thres

Defines the threshold, below which an ICL-increase is meaningless. The threshold is given as the multiple (or fraction) of the costs for a single cluster.

sub.samples

The number of samples used for initial hierarchical clustering.

sub.extract

The threshold used for cluster data extraction.

sub.weights

Power of weights applied to hierarchical clustering, where the used weights are the probabilities of cluster membership.

sub.EM

Used EM-algorithm; either "MEt" for EMt-iteration or "ME" for EM-iteration without test step.

sub.standardize

A numeric indicating whether the samples for hierarchical clustering are standardized (mean=0, SD=1).

seed

The seed integer for the random number generator.

Arguments for transformation optimization:

trans.estimate

A numeric indicator whether transformation estimation should be applied.

trans.minclust

The minimum number of clusters required to start transformation estimation.

trans.a

A numeric vector, giving the (initial) scaling a for the asinh-transformation h(y) = asin(a cdot y + b). A scaling factor of a=0 indicates that a parameter is not transformed.

trans.b

A numeric vector, giving the (initial) translation b for the asinh-transformation.

trans.parameters

A character vector, specifying the parameters (columns) to be applied for transformation. When it is left unspecified, the parameters to be transformed are obtained by the PxDISPLAY information of the flowFrame description parameters. All parameters with LOG display values are transformed.

trans.decade

A numeric scale value for the theorectical maximum of transformed observation value. If below 0, no scaling of the trasnformed values is applied, which is the default in the immunoClust-pipeline.

trans.scale

A numeric scaling factor for the linear (i.e. not transformed) parameters. By default the linear parameters (normally the scatter parameters) are not scaled.

trans.proc

An experimental switch for alternative procedures; should be "vsHtransAw".

Details

The cell.process function does data preprocessing and calls the major iteration loop either with or without integrated transformation optimization. When transformation optimization is applied the transformation parameters give the initial transformation otherwise they define the fixed transformation.

The major iteration loop with included transformation optimization relies on flowFrames structure from the flowCore-package for the storage of the observed data.

The cell.InitialModel builds up an initial immunoClust-object with one cluster and the given transformation parameters.

The cell.classifyAll calculates the cluster membership for the removed cell events. The assigment of the cluster membership is critical for over- and underexposed obsevervations and the interpretaion is problematic.

Value

The fitted model information in an object of class immunoClust.

Note

a) The data preprocessing arguments (min.count, max.count, min and max) for removing over- and underexposed observations are adopted from flowCust-package with the same meaning.

b) The sub.thres value is given in here in relation to the single cluster costs 1/2 x P x (P+1) x log(N). An absolute increase of the log-likelihood above is reported as reasonable from the literature. From our experience a higher value is required for this increase in FC data. For the ICL-bias and the sub.thres identical values were chosen. For the CyTOF dataset this value had been adjusted to 0.05 since the absolute increase of the log-likelihood became to high due to the high number of parameters.

c) The sub.extract value controls the smooth data extraction for a cluster. A higher value includes more events for a cluster in the sub-clustering routine.

d) The default value of trans.a=0.01 for the initial transformation is optimized for Fluorescence Cytometry. For CyTOF data the initial scaling value was trans.a=1.0.

Author(s)

Till Sörensen till-antoni.soerensen@charite.de

References

Sörensen, T., Baumgart, S., Durek, P., Grützkau, A. and Häupl, T. immunoClust - an automated analysis pipeline for the identification of immunophenotypic signatures in high-dimensional cytometric datasets. Cytometry A (accepted).

See Also

immunoClust-object, plot, splom, cell.FitModel, cell.SubClustering, trans.FitToData

Examples

data(dat.fcs)
res <- cell.process(dat.fcs)
summary(res)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(immunoClust)
Loading required package: grid
Loading required package: lattice
Loading required package: flowCore
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/immunoClust/cell.process.Rd_%03d_medium.png", width=480, height=480)
> ### Name: cell.process
> ### Title: Clustering of Cell-events in the immunoClust-pipeline
> ### Aliases: cell.process cell.MajorIterationLoop cell.MajorIterationTrans
> ###   cell.InitialModel cell.classifyAll
> ### Keywords: cluster
> 
> ### ** Examples
> 
> data(dat.fcs)
> res <- cell.process(dat.fcs)
filtered from above:318
filtered from below:0

Test cluster 1 for sub-clustering
EM takes 0.017 mins minutes
cluster 1 has 2 sub-cluster at 2, ICL=149
cluster 1 has 2 sub-cluster at 2, ICL=0
split cluster 1 into 2 sub-cluster
Model Refinement takes 0.017 mins minutes

Fit Model 1 of (6/10) K=1->2 N=302
EM takes 0 mins minutes

Test cluster 1 for sub-clustering
EM takes 0 mins minutes
Test cluster 2 for sub-clustering
EM takes 0.017 mins minutes
cluster 1 has 2 sub-cluster at 2, ICL=133
cluster 2 has 4 sub-cluster at 4, ICL=41
cluster 1 has 2 sub-cluster at 2, ICL=0
split cluster 1 into 2 sub-cluster
split cluster 2 into 4 sub-cluster
Model Refinement takes 0.017 mins minutes

Fit Model 2 of (6/10) K=2->6 N=605
EM takes 0 mins minutes

Test cluster 1 for sub-clustering
EM takes 0 mins minutes
Test cluster 2 for sub-clustering
EM takes 0 mins minutes
Test cluster 3 for sub-clustering
EM takes 0 mins minutes
Test cluster 4 for sub-clustering
EM takes 0 mins minutes
Test cluster 5 for sub-clustering
EM takes 0 mins minutes
Test cluster 6 for sub-clustering
EM takes 0.017 mins minutes
cluster 2 has 2 sub-cluster at 2, ICL=108
cluster 3 has 2 sub-cluster at 2, ICL=21
cluster 6 has 7 sub-cluster at 8, ICL=4
split cluster 2 into 2 sub-cluster
Model Refinement takes 0.017 mins minutes

Fit Model 3 of (6/10) K=6->7 N=1210
EM takes 0 mins minutes

Test cluster 1 for sub-clustering
EM takes 0 mins minutes
Test cluster 2 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 3 for sub-clustering
EM takes 0 mins minutes
Test cluster 4 for sub-clustering
EM takes 0 mins minutes
Test cluster 5 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 6 for sub-clustering
EM takes 0.017 mins minutes
cluster 4 has 3 sub-cluster at 3, ICL=54
cluster 2 has 2 sub-cluster at 2, ICL=30
cluster 5 has 5 sub-cluster at 6, ICL=15
cluster 3 has 6 sub-cluster at 7, ICL=10
cluster 1 has 3 sub-cluster at 3, ICL=9.7
cluster 1 has 3 sub-cluster at 3, ICL=0
split cluster 4 into 3 sub-cluster
Model Refinement takes 0.05 mins minutes

Fit Model 4 of (6/10) K=6->8 N=2420
EM takes 0 mins minutes

Test cluster 1 for sub-clustering
EM takes 0 mins minutes
Test cluster 2 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 3 for sub-clustering
EM takes 0 mins minutes
Test cluster 4 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 5 for sub-clustering
EM takes 0 mins minutes
Test cluster 6 for sub-clustering
EM takes 0 mins minutes
Test cluster 7 for sub-clustering
EM takes 0 mins minutes
Test cluster 8 for sub-clustering
EM takes 0.033 mins minutes
cluster 1 has 2 sub-cluster at 2, ICL=52
cluster 2 has 5 sub-cluster at 6, ICL=20
cluster 7 has 7 sub-cluster at 7, ICL=18
cluster 3 has 8 sub-cluster at 8, ICL=13
cluster 4 has 4 sub-cluster at 6, ICL=11
cluster 5 has 2 sub-cluster at 2, ICL=6.5
cluster 1 has 2 sub-cluster at 2, ICL=0
split cluster 1 into 2 sub-cluster
Model Refinement takes 0.067 mins minutes

Fit Model 5 of (6/10) K=8->9 N=4841
EM takes 0 mins minutes

Test cluster 1 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 2 for sub-clustering
EM takes 0 mins minutes
Test cluster 3 for sub-clustering
EM takes 0 mins minutes
Test cluster 4 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 5 for sub-clustering
EM takes 0 mins minutes
Test cluster 6 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 7 for sub-clustering
EM takes 0 mins minutes
Test cluster 8 for sub-clustering
EM takes 0 mins minutes
Test cluster 9 for sub-clustering
EM takes 0.033 mins minutes
cluster 8 has 4 sub-cluster at 4, ICL=33
cluster 6 has 4 sub-cluster at 4, ICL=26
cluster 3 has 2 sub-cluster at 2, ICL=24
cluster 5 has 5 sub-cluster at 6, ICL=13
cluster 1 has 3 sub-cluster at 3, ICL=11
cluster 4 has 2 sub-cluster at 2, ICL=8.8
cluster 1 has 3 sub-cluster at 3, ICL=0
Model Refinement takes 0.083 mins minutes

Fit Model 6 of (6/10) K=9->9 N=9682
EM takes 0 mins minutes

Test cluster 1 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 2 for sub-clustering
EM takes 0 mins minutes
Test cluster 3 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 4 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 5 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 6 for sub-clustering
EM takes 0 mins minutes
Test cluster 7 for sub-clustering
EM takes 0 mins minutes
Test cluster 8 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 9 for sub-clustering
EM takes 0.033 mins minutes
cluster 8 has 3 sub-cluster at 3, ICL=65
cluster 4 has 2 sub-cluster at 2, ICL=51
cluster 6 has 4 sub-cluster at 4, ICL=43
cluster 3 has 3 sub-cluster at 3, ICL=38
cluster 2 has 2 sub-cluster at 2, ICL=30
cluster 1 has 3 sub-cluster at 3, ICL=29
cluster 5 has 5 sub-cluster at 6, ICL=6.5
cluster 7 has 3 sub-cluster at 4, ICL=6.3
cluster 1 has 3 sub-cluster at 3, ICL=0
split cluster 4 into 2 sub-cluster
split cluster 8 into 3 sub-cluster
Model Refinement takes 0.12 mins minutes

Fit Model 7 of (6/10) K=9->12 N=9682
EM takes 0.017 mins minutes

Test cluster 1 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 2 for sub-clustering
EM takes 0 mins minutes
Test cluster 3 for sub-clustering
EM takes 0 mins minutes
Test cluster 4 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 5 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 6 for sub-clustering
EM takes 0 mins minutes
Test cluster 7 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 8 for sub-clustering
EM takes 0 mins minutes
Test cluster 9 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 10 for sub-clustering
EM takes 0 mins minutes
Test cluster 11 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 12 for sub-clustering
EM takes 0.033 mins minutes
cluster 3 has 2 sub-cluster at 2, ICL=37
cluster 2 has 2 sub-cluster at 2, ICL=30
cluster 1 has 3 sub-cluster at 3, ICL=29
cluster 7 has 3 sub-cluster at 3, ICL=26
cluster 8 has 5 sub-cluster at 6, ICL=19
cluster 11 has 2 sub-cluster at 2, ICL=18
cluster 5 has 2 sub-cluster at 3, ICL=15
cluster 9 has 2 sub-cluster at 2, ICL=12
cluster 10 has 6 sub-cluster at 7, ICL=5.6
cluster 6 has 5 sub-cluster at 5, ICL=0.99
cluster 1 has 3 sub-cluster at 3, ICL=0
Model Refinement takes 0.13 mins minutes

Fit Model 8 of (6/10) K=12->12 N=9682
EM takes 0 mins minutes

Process completed at8

Major Iteration (Trans) (6/4) takes 0.52 mins minutes
> summary(res)
** Experiment Information ** 
Experiment name: immunoClust Experiment 
Data Filename:   fcs/12443.fcs 
Parameters:   FSC-A SSC-A FITC-A PE-A APC-A APC-Cy7-A Pacific Blue-A 
Description:  NA NA CD14 CD19 CD15 CD4 CD3 

** Data Information ** 
Number of observations: 10000 
Number of parameters:   7 
Removed from above:    318 (3.18%)
Removed from below:    0 (0%)

** Transformation Information ** 
htrans-A:   0.000000 0.000000 0.010000 0.010000 0.010000 0.010000 0.010000 
htrans-B:   0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 
htrans-decade:   -1 

** Clustering Summary ** 
Number of clusters: 12 
Cluster     Proportion  Observations
       1      0.091851           890
       2      0.005914            57
       3      0.040373           391
       4      0.084059           827
       5      0.033547           312
       6      0.028571           281
       7      0.012627           122
       8      0.007323            70
       9      0.034162           333
      10      0.015835           154
      11      0.008024            79
      12      0.637713          6166

    Min.      0.005914            57
    Max.      0.637713          6166

** Information Criteria ** 
Log likelihood: -254005 -254201.8 -173041.3 
BIC: -254005 
ICL: -254201.8 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>