R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Clustering of Cell-events in the immunoClust-pipeline

cell.process

R Documentation

Clustering of Cell-events in the immunoClust-pipeline

Description

This function performs iterative model based clustering on cell-event data. It takes the observed cell-event data as major input and returns an object of class immunoClust, which contains the fitted mixture model parameter and cluster membership information. The additional arguments control the routines for data preprocessing, major loop and EMt-iteration, the model refinement routine and transformation estimation.

Usage

cell.process(fcs, parameters=NULL, 
    apply.compensation=FALSE, classify.all=FALSE, 
    N=NULL, min.count=10, max.count=10, min=NULL, max=NULL,  
    I.buildup=6, I.final=4, I.trans=I.buildup, 
    modelName="mvt", tol=1e-5, bias=0.3,
    sub.tol= 1e-4, sub.bias=bias, sub.thres=bias, sub.samples=1500, 
    sub.extract=0.8, sub.weights=1, sub.standardize=TRUE,
    trans.estimate=TRUE, trans.minclust=10, 
    trans.a=0.01, trans.b=0.0, trans.parameters=NULL)

cell.MajorIterationLoop(dat, x=NULL, parameters=NULL, 
    I.buildup=6, I.final=4, 
    modelName="mvt", tol=1e-5, bias=0.3,
    sub.bias=bias, sub.thres=0.0, sub.tol=1e-4, sub.samples=1500, 
    sub.extract=0.8, sub.weights=1, sub.EM="MEt", sub.standardize=TRUE, seed=1)

cell.MajorIterationTrans(fcs, x=NULL, parameters=NULL, 
    I.buildup=6, I.final=4, I.trans=I.buildup, 
    modelName="mvt", tol=1e-5, bias=0.3,
    sub.bias=bias, sub.thres=0.0, sub.tol=1e-4, sub.samples=1500, 
    sub.extract=0.8, sub.weights=1, sub.EM="MEt", sub.standardize=TRUE, seed=1, 
    trans.minclust=5, trans.a=0.01, trans.decade=-1, trans.scale=1.0, 
    trans.proc="vsHtransAw")

cell.InitialModel(dat, parameters=NULL, trans.a = 0.01, trans.b = 0.0, 
    trans.decade=-1, trans.scale=1.0)

cell.classifyAll(fcs, x, apply.compensation=FALSE)

Arguments

`fcs`	An object of class flowFrame. Rows correspond to observations and columns correspond to measured parameters.
`dat`	A numeric matrix, data frame of observations, or object of class flowFrame. Rows correspond to observations and columns correspond to measured parameters.
`x`	An object of class `immunoClust`. Used as inital model int the major iteration loop. When left unspecified the simplest model containing 1 cluster is used as initial model.

Arguments for data pre and post processing:

`parameters`	A character vector specifying the parameters (columns) to be included in clustering. When it is left unspecified, all the parameters will be used.
`apply.compensation`	A numeric indicator whether the compensation matrix in the flowFrame should be applied.
`classify.all`	A numeric indicator whether the removed over- and underexposed observations should also be classified after the clustering process.
`N`	Maximum number of observations used for clustering. When unspecified or higher than the number of observations (i.e. rows) in dat, all observations are used for clustering, otherwise only the first `N` observations.
`min.count`	An integer specifying the threshold count for filtering data points from below. The default is 10, meaning that if 10 or more data points are smaller than or equal to `min`, they will be excluded from the analysis. If `min` is `NULL`, then the minimum value of each parameter will be used. To suppress filtering, it is set to -1.
`max.count`	An integer specifying the threshold count for filtering data points from above. Interpretation is similar to that of `min.count`.
`min`	The lower limit set for data filtering. Note that it is a vector of length equal to the number of parameters (columns), implying that a different value can be set for each parameter.
`max`	The upper limit set for data filtering. Interpretation is similar to that of `min`.

Arguments for the major loop and EMt-iteration:

`I.buildup`	The number of major iterations, where the number of used observations is doubled successively.
`I.final`	The number of major iterations with all observations.
`I.trans`	The number of iterations where transformation estimation is applied.
`modelName`	Used mixture model; either `"mvt"` for a t-mixture model or `"mvn"` for a Gaussian Mixture model.
`tol`	The tolerance used to assess the convergence of the major EM(t)-algorithms of all observations.
`bias`	The ICL-bias used in the major EMt-algorithms of all observations.

Arguments for model refinement (sub-clustering):

`sub.tol`	The tolerance used to assess the convergence of the EM-algorithms in the sub-clustering.
`sub.bias`	The ICL-bias used in the sub-clustering EMt-algorithms, in general the same as the ICL-bias.
`sub.thres`	Defines the threshold, below which an ICL-increase is meaningless. The threshold is given as the multiple (or fraction) of the costs for a single cluster.
`sub.samples`	The number of samples used for initial hierarchical clustering.
`sub.extract`	The threshold used for cluster data extraction.
`sub.weights`	Power of weights applied to hierarchical clustering, where the used weights are the probabilities of cluster membership.
`sub.EM`	Used EM-algorithm; either `"MEt"` for EMt-iteration or `"ME"` for EM-iteration without test step.
`sub.standardize`	A numeric indicating whether the samples for hierarchical clustering are standardized (mean=0, SD=1).
`seed`	The seed integer for the random number generator.

Arguments for transformation optimization:

`trans.estimate`	A numeric indicator whether transformation estimation should be applied.
`trans.minclust`	The minimum number of clusters required to start transformation estimation.
`trans.a`	A numeric vector, giving the (initial) scaling a for the asinh-transformation h(y) = asin(a cdot y + b). A scaling factor of a=0 indicates that a parameter is not transformed.
`trans.b`	A numeric vector, giving the (initial) translation b for the asinh-transformation.
`trans.parameters`	A character vector, specifying the parameters (columns) to be applied for transformation. When it is left unspecified, the parameters to be transformed are obtained by the `PxDISPLAY` information of the flowFrame description parameters. All parameters with LOG display values are transformed.
`trans.decade`	A numeric scale value for the theorectical maximum of transformed observation value. If below 0, no scaling of the trasnformed values is applied, which is the default in the immunoClust-pipeline.
`trans.scale`	A numeric scaling factor for the linear (i.e. not transformed) parameters. By default the linear parameters (normally the scatter parameters) are not scaled.
`trans.proc`	An experimental switch for alternative procedures; should be "vsHtransAw".

Details

The cell.process function does data preprocessing and calls the major iteration loop either with or without integrated transformation optimization. When transformation optimization is applied the transformation parameters give the initial transformation otherwise they define the fixed transformation.

The major iteration loop with included transformation optimization relies on flowFrames structure from the flowCore-package for the storage of the observed data.

The cell.InitialModel builds up an initial immunoClust-object with one cluster and the given transformation parameters.

The cell.classifyAll calculates the cluster membership for the removed cell events. The assigment of the cluster membership is critical for over- and underexposed obsevervations and the interpretaion is problematic.

Value

The fitted model information in an object of class immunoClust.

Note

a) The data preprocessing arguments (min.count, max.count, min and max) for removing over- and underexposed observations are adopted from flowCust-package with the same meaning.

b) The sub.thres value is given in here in relation to the single cluster costs 1/2 x P x (P+1) x log(N). An absolute increase of the log-likelihood above is reported as reasonable from the literature. From our experience a higher value is required for this increase in FC data. For the ICL-bias and the sub.thres identical values were chosen. For the CyTOF dataset this value had been adjusted to 0.05 since the absolute increase of the log-likelihood became to high due to the high number of parameters.

c) The sub.extract value controls the smooth data extraction for a cluster. A higher value includes more events for a cluster in the sub-clustering routine.

d) The default value of trans.a=0.01 for the initial transformation is optimized for Fluorescence Cytometry. For CyTOF data the initial scaling value was trans.a=1.0.

Author(s)

Till SÃ¶rensen till-antoni.soerensen@charite.de

References

SÃ¶rensen, T., Baumgart, S., Durek, P., GrÃ¼tzkau, A. and HÃ¤upl, T. immunoClust - an automated analysis pipeline for the identification of immunophenotypic signatures in high-dimensional cytometric datasets. Cytometry A (accepted).

Examples

data(dat.fcs)
res <- cell.process(dat.fcs)
summary(res)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(immunoClust)
Loading required package: grid
Loading required package: lattice
Loading required package: flowCore
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/immunoClust/cell.process.Rd_%03d_medium.png", width=480, height=480)
> ### Name: cell.process
> ### Title: Clustering of Cell-events in the immunoClust-pipeline
> ### Aliases: cell.process cell.MajorIterationLoop cell.MajorIterationTrans
> ###   cell.InitialModel cell.classifyAll
> ### Keywords: cluster
> 
> ### ** Examples
> 
> data(dat.fcs)
> res <- cell.process(dat.fcs)
filtered from above:318
filtered from below:0

Test cluster 1 for sub-clustering
EM takes 0.017 mins minutes
cluster 1 has 2 sub-cluster at 2, ICL=149
cluster 1 has 2 sub-cluster at 2, ICL=0
split cluster 1 into 2 sub-cluster
Model Refinement takes 0.017 mins minutes

Fit Model 1 of (6/10) K=1->2 N=302
EM takes 0 mins minutes

Test cluster 1 for sub-clustering
EM takes 0 mins minutes
Test cluster 2 for sub-clustering
EM takes 0.017 mins minutes
cluster 1 has 2 sub-cluster at 2, ICL=133
cluster 2 has 4 sub-cluster at 4, ICL=41
cluster 1 has 2 sub-cluster at 2, ICL=0
split cluster 1 into 2 sub-cluster
split cluster 2 into 4 sub-cluster
Model Refinement takes 0.017 mins minutes

Fit Model 2 of (6/10) K=2->6 N=605
EM takes 0 mins minutes

Test cluster 1 for sub-clustering
EM takes 0 mins minutes
Test cluster 2 for sub-clustering
EM takes 0 mins minutes
Test cluster 3 for sub-clustering
EM takes 0 mins minutes
Test cluster 4 for sub-clustering
EM takes 0 mins minutes
Test cluster 5 for sub-clustering
EM takes 0 mins minutes
Test cluster 6 for sub-clustering
EM takes 0.017 mins minutes
cluster 2 has 2 sub-cluster at 2, ICL=108
cluster 3 has 2 sub-cluster at 2, ICL=21
cluster 6 has 7 sub-cluster at 8, ICL=4
split cluster 2 into 2 sub-cluster
Model Refinement takes 0.017 mins minutes

Fit Model 3 of (6/10) K=6->7 N=1210
EM takes 0 mins minutes

Test cluster 1 for sub-clustering
EM takes 0 mins minutes
Test cluster 2 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 3 for sub-clustering
EM takes 0 mins minutes
Test cluster 4 for sub-clustering
EM takes 0 mins minutes
Test cluster 5 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 6 for sub-clustering
EM takes 0.017 mins minutes
cluster 4 has 3 sub-cluster at 3, ICL=54
cluster 2 has 2 sub-cluster at 2, ICL=30
cluster 5 has 5 sub-cluster at 6, ICL=15
cluster 3 has 6 sub-cluster at 7, ICL=10
cluster 1 has 3 sub-cluster at 3, ICL=9.7
cluster 1 has 3 sub-cluster at 3, ICL=0
split cluster 4 into 3 sub-cluster
Model Refinement takes 0.05 mins minutes

Fit Model 4 of (6/10) K=6->8 N=2420
EM takes 0 mins minutes

Test cluster 1 for sub-clustering
EM takes 0 mins minutes
Test cluster 2 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 3 for sub-clustering
EM takes 0 mins minutes
Test cluster 4 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 5 for sub-clustering
EM takes 0 mins minutes
Test cluster 6 for sub-clustering
EM takes 0 mins minutes
Test cluster 7 for sub-clustering
EM takes 0 mins minutes
Test cluster 8 for sub-clustering
EM takes 0.033 mins minutes
cluster 1 has 2 sub-cluster at 2, ICL=52
cluster 2 has 5 sub-cluster at 6, ICL=20
cluster 7 has 7 sub-cluster at 7, ICL=18
cluster 3 has 8 sub-cluster at 8, ICL=13
cluster 4 has 4 sub-cluster at 6, ICL=11
cluster 5 has 2 sub-cluster at 2, ICL=6.5
cluster 1 has 2 sub-cluster at 2, ICL=0
split cluster 1 into 2 sub-cluster
Model Refinement takes 0.067 mins minutes

Fit Model 5 of (6/10) K=8->9 N=4841
EM takes 0 mins minutes

Test cluster 1 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 2 for sub-clustering
EM takes 0 mins minutes
Test cluster 3 for sub-clustering
EM takes 0 mins minutes
Test cluster 4 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 5 for sub-clustering
EM takes 0 mins minutes
Test cluster 6 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 7 for sub-clustering
EM takes 0 mins minutes
Test cluster 8 for sub-clustering
EM takes 0 mins minutes
Test cluster 9 for sub-clustering
EM takes 0.033 mins minutes
cluster 8 has 4 sub-cluster at 4, ICL=33
cluster 6 has 4 sub-cluster at 4, ICL=26
cluster 3 has 2 sub-cluster at 2, ICL=24
cluster 5 has 5 sub-cluster at 6, ICL=13
cluster 1 has 3 sub-cluster at 3, ICL=11
cluster 4 has 2 sub-cluster at 2, ICL=8.8
cluster 1 has 3 sub-cluster at 3, ICL=0
Model Refinement takes 0.083 mins minutes

Fit Model 6 of (6/10) K=9->9 N=9682
EM takes 0 mins minutes

Test cluster 1 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 2 for sub-clustering
EM takes 0 mins minutes
Test cluster 3 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 4 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 5 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 6 for sub-clustering
EM takes 0 mins minutes
Test cluster 7 for sub-clustering
EM takes 0 mins minutes
Test cluster 8 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 9 for sub-clustering
EM takes 0.033 mins minutes
cluster 8 has 3 sub-cluster at 3, ICL=65
cluster 4 has 2 sub-cluster at 2, ICL=51
cluster 6 has 4 sub-cluster at 4, ICL=43
cluster 3 has 3 sub-cluster at 3, ICL=38
cluster 2 has 2 sub-cluster at 2, ICL=30
cluster 1 has 3 sub-cluster at 3, ICL=29
cluster 5 has 5 sub-cluster at 6, ICL=6.5
cluster 7 has 3 sub-cluster at 4, ICL=6.3
cluster 1 has 3 sub-cluster at 3, ICL=0
split cluster 4 into 2 sub-cluster
split cluster 8 into 3 sub-cluster
Model Refinement takes 0.12 mins minutes

Fit Model 7 of (6/10) K=9->12 N=9682
EM takes 0.017 mins minutes

Test cluster 1 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 2 for sub-clustering
EM takes 0 mins minutes
Test cluster 3 for sub-clustering
EM takes 0 mins minutes
Test cluster 4 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 5 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 6 for sub-clustering
EM takes 0 mins minutes
Test cluster 7 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 8 for sub-clustering
EM takes 0 mins minutes
Test cluster 9 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 10 for sub-clustering
EM takes 0 mins minutes
Test cluster 11 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 12 for sub-clustering
EM takes 0.033 mins minutes
cluster 3 has 2 sub-cluster at 2, ICL=37
cluster 2 has 2 sub-cluster at 2, ICL=30
cluster 1 has 3 sub-cluster at 3, ICL=29
cluster 7 has 3 sub-cluster at 3, ICL=26
cluster 8 has 5 sub-cluster at 6, ICL=19
cluster 11 has 2 sub-cluster at 2, ICL=18
cluster 5 has 2 sub-cluster at 3, ICL=15
cluster 9 has 2 sub-cluster at 2, ICL=12
cluster 10 has 6 sub-cluster at 7, ICL=5.6
cluster 6 has 5 sub-cluster at 5, ICL=0.99
cluster 1 has 3 sub-cluster at 3, ICL=0
Model Refinement takes 0.13 mins minutes

Fit Model 8 of (6/10) K=12->12 N=9682
EM takes 0 mins minutes

Process completed at8

Major Iteration (Trans) (6/4) takes 0.52 mins minutes
> summary(res)
** Experiment Information ** 
Experiment name: immunoClust Experiment 
Data Filename:   fcs/12443.fcs 
Parameters:   FSC-A SSC-A FITC-A PE-A APC-A APC-Cy7-A Pacific Blue-A 
Description:  NA NA CD14 CD19 CD15 CD4 CD3 

** Data Information ** 
Number of observations: 10000 
Number of parameters:   7 
Removed from above:    318 (3.18%)
Removed from below:    0 (0%)

** Transformation Information ** 
htrans-A:   0.000000 0.000000 0.010000 0.010000 0.010000 0.010000 0.010000 
htrans-B:   0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 
htrans-decade:   -1 

** Clustering Summary ** 
Number of clusters: 12 
Cluster     Proportion  Observations
       1      0.091851           890
       2      0.005914            57
       3      0.040373           391
       4      0.084059           827
       5      0.033547           312
       6      0.028571           281
       7      0.012627           122
       8      0.007323            70
       9      0.034162           333
      10      0.015835           154
      11      0.008024            79
      12      0.637713          6166

    Min.      0.005914            57
    Max.      0.637713          6166

** Information Criteria ** 
Log likelihood: -254005 -254201.8 -173041.3 
BIC: -254005 
ICL: -254201.8 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>