R: Clustering of Cell-events in the immunoClust-pipeline
cell.process
R Documentation
Clustering of Cell-events in the immunoClust-pipeline
Description
This function performs iterative model based clustering on cell-event data. It
takes the observed cell-event data as major input and returns an object of class
immunoClust, which contains the fitted mixture model parameter and
cluster membership information. The additional arguments control the routines
for data preprocessing, major loop and EMt-iteration, the model refinement
routine and transformation estimation.
An object of class flowFrame. Rows correspond to observations and
columns correspond to measured parameters.
dat
A numeric matrix, data frame of observations, or object of class
flowFrame. Rows correspond to observations and columns correspond to measured
parameters.
x
An object of class immunoClust. Used as inital model int the
major iteration loop. When left unspecified the simplest model containing 1
cluster is used as initial model.
Arguments for data pre and post processing:
parameters
A character vector specifying the parameters (columns) to be
included in clustering. When it is left unspecified, all the parameters will be
used.
apply.compensation
A numeric indicator whether the compensation matrix
in the flowFrame should be applied.
classify.all
A numeric indicator whether the removed over- and
underexposed observations should also be classified after the clustering
process.
N
Maximum number of observations used for clustering. When unspecified
or higher than the number of observations (i.e. rows) in dat, all observations
are used for clustering, otherwise only the first N observations.
min.count
An integer specifying the threshold count for filtering data
points from below. The default is 10, meaning that if 10 or more data points
are smaller than or equal to min, they will be excluded from the
analysis. If min is NULL, then the minimum value of each
parameter will be used. To suppress filtering, it is set to -1.
max.count
An integer specifying the threshold count for filtering
data points from above. Interpretation is similar to that of min.count.
min
The lower limit set for data filtering. Note that it is a vector of
length equal to the number of parameters (columns), implying that a different
value can be set for each parameter.
max
The upper limit set for data filtering. Interpretation is similar to
that of min.
Arguments for the major loop and EMt-iteration:
I.buildup
The number of major iterations, where the number of used
observations is doubled successively.
I.final
The number of major iterations with all observations.
I.trans
The number of iterations where transformation estimation is
applied.
modelName
Used mixture model; either "mvt" for a t-mixture
model or "mvn" for a Gaussian Mixture model.
tol
The tolerance used to assess the convergence of the major
EM(t)-algorithms of all observations.
bias
The ICL-bias used in the major EMt-algorithms of all observations.
Arguments for model refinement (sub-clustering):
sub.tol
The tolerance used to assess the convergence of the
EM-algorithms in the sub-clustering.
sub.bias
The ICL-bias used in the sub-clustering EMt-algorithms, in
general the same as the ICL-bias.
sub.thres
Defines the threshold, below which an ICL-increase is
meaningless. The threshold is given as the multiple (or fraction) of the costs
for a single cluster.
sub.samples
The number of samples used for initial hierarchical
clustering.
sub.extract
The threshold used for cluster data extraction.
sub.weights
Power of weights applied to hierarchical clustering, where
the used weights are the probabilities of cluster membership.
sub.EM
Used EM-algorithm; either "MEt" for EMt-iteration or
"ME" for EM-iteration without test step.
sub.standardize
A numeric indicating whether the samples for
hierarchical clustering are standardized (mean=0, SD=1).
seed
The seed integer for the random number generator.
Arguments for transformation optimization:
trans.estimate
A numeric indicator whether transformation estimation
should be applied.
trans.minclust
The minimum number of clusters required to start
transformation estimation.
trans.a
A numeric vector, giving the (initial) scaling a for the
asinh-transformation h(y) = asin(a cdot y + b). A scaling factor of
a=0 indicates that a parameter is not transformed.
trans.b
A numeric vector, giving the (initial) translation b for
the asinh-transformation.
trans.parameters
A character vector, specifying the parameters (columns)
to be applied for transformation. When it is left unspecified, the parameters
to be transformed are obtained by the PxDISPLAY information of the
flowFrame description parameters. All parameters with LOG display values are
transformed.
trans.decade
A numeric scale value for the theorectical maximum of
transformed observation value. If below 0, no scaling of the trasnformed values
is applied, which is the default in the immunoClust-pipeline.
trans.scale
A numeric scaling factor for the linear (i.e. not
transformed) parameters. By default the linear parameters (normally the scatter
parameters) are not scaled.
trans.proc
An experimental switch for alternative procedures; should
be "vsHtransAw".
Details
The cell.process function does data preprocessing and calls the major
iteration loop either with or without integrated transformation optimization.
When transformation optimization is applied the transformation parameters give
the initial transformation otherwise they define the fixed
transformation.
The major iteration loop with included transformation optimization relies on
flowFrames structure from the flowCore-package for the storage of
the observed data.
The cell.InitialModel builds up an initial immunoClust-object
with one cluster and the given transformation parameters.
The cell.classifyAll calculates the cluster membership for the removed
cell events. The assigment of the cluster membership is critical for over- and
underexposed obsevervations and the interpretaion is problematic.
Value
The fitted model information in an object of class
immunoClust.
Note
a) The data preprocessing arguments (min.count, max.count,
min and max) for removing over- and underexposed observations are
adopted from flowCust-package with the same meaning.
b) The sub.thres value is given in here in relation to the single
cluster costs
1/2 x P x (P+1) x log(N).
An absolute increase of the log-likelihood above is reported as
reasonable from the literature. From our experience a higher value is required
for this increase in FC data. For the ICL-bias and the sub.thres identical
values were chosen. For the CyTOF dataset this value had been adjusted to 0.05
since the absolute increase of the log-likelihood became to high due to the
high number of parameters.
c) The sub.extract value controls the smooth data extraction for a
cluster. A higher value includes more events for a cluster in the
sub-clustering routine.
d) The default value of trans.a=0.01 for the initial transformation is
optimized for Fluorescence Cytometry. For CyTOF data the initial scaling value
was trans.a=1.0.
Sörensen, T., Baumgart, S., Durek, P., Grützkau, A. and Häupl, T.
immunoClust - an automated analysis pipeline for the identification of
immunophenotypic signatures in high-dimensional cytometric datasets.
Cytometry A (accepted).
data(dat.fcs)
res <- cell.process(dat.fcs)
summary(res)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(immunoClust)
Loading required package: grid
Loading required package: lattice
Loading required package: flowCore
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/immunoClust/cell.process.Rd_%03d_medium.png", width=480, height=480)
> ### Name: cell.process
> ### Title: Clustering of Cell-events in the immunoClust-pipeline
> ### Aliases: cell.process cell.MajorIterationLoop cell.MajorIterationTrans
> ### cell.InitialModel cell.classifyAll
> ### Keywords: cluster
>
> ### ** Examples
>
> data(dat.fcs)
> res <- cell.process(dat.fcs)
filtered from above:318
filtered from below:0
Test cluster 1 for sub-clustering
EM takes 0.017 mins minutes
cluster 1 has 2 sub-cluster at 2, ICL=149
cluster 1 has 2 sub-cluster at 2, ICL=0
split cluster 1 into 2 sub-cluster
Model Refinement takes 0.017 mins minutes
Fit Model 1 of (6/10) K=1->2 N=302
EM takes 0 mins minutes
Test cluster 1 for sub-clustering
EM takes 0 mins minutes
Test cluster 2 for sub-clustering
EM takes 0.017 mins minutes
cluster 1 has 2 sub-cluster at 2, ICL=133
cluster 2 has 4 sub-cluster at 4, ICL=41
cluster 1 has 2 sub-cluster at 2, ICL=0
split cluster 1 into 2 sub-cluster
split cluster 2 into 4 sub-cluster
Model Refinement takes 0.017 mins minutes
Fit Model 2 of (6/10) K=2->6 N=605
EM takes 0 mins minutes
Test cluster 1 for sub-clustering
EM takes 0 mins minutes
Test cluster 2 for sub-clustering
EM takes 0 mins minutes
Test cluster 3 for sub-clustering
EM takes 0 mins minutes
Test cluster 4 for sub-clustering
EM takes 0 mins minutes
Test cluster 5 for sub-clustering
EM takes 0 mins minutes
Test cluster 6 for sub-clustering
EM takes 0.017 mins minutes
cluster 2 has 2 sub-cluster at 2, ICL=108
cluster 3 has 2 sub-cluster at 2, ICL=21
cluster 6 has 7 sub-cluster at 8, ICL=4
split cluster 2 into 2 sub-cluster
Model Refinement takes 0.017 mins minutes
Fit Model 3 of (6/10) K=6->7 N=1210
EM takes 0 mins minutes
Test cluster 1 for sub-clustering
EM takes 0 mins minutes
Test cluster 2 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 3 for sub-clustering
EM takes 0 mins minutes
Test cluster 4 for sub-clustering
EM takes 0 mins minutes
Test cluster 5 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 6 for sub-clustering
EM takes 0.017 mins minutes
cluster 4 has 3 sub-cluster at 3, ICL=54
cluster 2 has 2 sub-cluster at 2, ICL=30
cluster 5 has 5 sub-cluster at 6, ICL=15
cluster 3 has 6 sub-cluster at 7, ICL=10
cluster 1 has 3 sub-cluster at 3, ICL=9.7
cluster 1 has 3 sub-cluster at 3, ICL=0
split cluster 4 into 3 sub-cluster
Model Refinement takes 0.05 mins minutes
Fit Model 4 of (6/10) K=6->8 N=2420
EM takes 0 mins minutes
Test cluster 1 for sub-clustering
EM takes 0 mins minutes
Test cluster 2 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 3 for sub-clustering
EM takes 0 mins minutes
Test cluster 4 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 5 for sub-clustering
EM takes 0 mins minutes
Test cluster 6 for sub-clustering
EM takes 0 mins minutes
Test cluster 7 for sub-clustering
EM takes 0 mins minutes
Test cluster 8 for sub-clustering
EM takes 0.033 mins minutes
cluster 1 has 2 sub-cluster at 2, ICL=52
cluster 2 has 5 sub-cluster at 6, ICL=20
cluster 7 has 7 sub-cluster at 7, ICL=18
cluster 3 has 8 sub-cluster at 8, ICL=13
cluster 4 has 4 sub-cluster at 6, ICL=11
cluster 5 has 2 sub-cluster at 2, ICL=6.5
cluster 1 has 2 sub-cluster at 2, ICL=0
split cluster 1 into 2 sub-cluster
Model Refinement takes 0.067 mins minutes
Fit Model 5 of (6/10) K=8->9 N=4841
EM takes 0 mins minutes
Test cluster 1 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 2 for sub-clustering
EM takes 0 mins minutes
Test cluster 3 for sub-clustering
EM takes 0 mins minutes
Test cluster 4 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 5 for sub-clustering
EM takes 0 mins minutes
Test cluster 6 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 7 for sub-clustering
EM takes 0 mins minutes
Test cluster 8 for sub-clustering
EM takes 0 mins minutes
Test cluster 9 for sub-clustering
EM takes 0.033 mins minutes
cluster 8 has 4 sub-cluster at 4, ICL=33
cluster 6 has 4 sub-cluster at 4, ICL=26
cluster 3 has 2 sub-cluster at 2, ICL=24
cluster 5 has 5 sub-cluster at 6, ICL=13
cluster 1 has 3 sub-cluster at 3, ICL=11
cluster 4 has 2 sub-cluster at 2, ICL=8.8
cluster 1 has 3 sub-cluster at 3, ICL=0
Model Refinement takes 0.083 mins minutes
Fit Model 6 of (6/10) K=9->9 N=9682
EM takes 0 mins minutes
Test cluster 1 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 2 for sub-clustering
EM takes 0 mins minutes
Test cluster 3 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 4 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 5 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 6 for sub-clustering
EM takes 0 mins minutes
Test cluster 7 for sub-clustering
EM takes 0 mins minutes
Test cluster 8 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 9 for sub-clustering
EM takes 0.033 mins minutes
cluster 8 has 3 sub-cluster at 3, ICL=65
cluster 4 has 2 sub-cluster at 2, ICL=51
cluster 6 has 4 sub-cluster at 4, ICL=43
cluster 3 has 3 sub-cluster at 3, ICL=38
cluster 2 has 2 sub-cluster at 2, ICL=30
cluster 1 has 3 sub-cluster at 3, ICL=29
cluster 5 has 5 sub-cluster at 6, ICL=6.5
cluster 7 has 3 sub-cluster at 4, ICL=6.3
cluster 1 has 3 sub-cluster at 3, ICL=0
split cluster 4 into 2 sub-cluster
split cluster 8 into 3 sub-cluster
Model Refinement takes 0.12 mins minutes
Fit Model 7 of (6/10) K=9->12 N=9682
EM takes 0.017 mins minutes
Test cluster 1 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 2 for sub-clustering
EM takes 0 mins minutes
Test cluster 3 for sub-clustering
EM takes 0 mins minutes
Test cluster 4 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 5 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 6 for sub-clustering
EM takes 0 mins minutes
Test cluster 7 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 8 for sub-clustering
EM takes 0 mins minutes
Test cluster 9 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 10 for sub-clustering
EM takes 0 mins minutes
Test cluster 11 for sub-clustering
EM takes 0.017 mins minutes
Test cluster 12 for sub-clustering
EM takes 0.033 mins minutes
cluster 3 has 2 sub-cluster at 2, ICL=37
cluster 2 has 2 sub-cluster at 2, ICL=30
cluster 1 has 3 sub-cluster at 3, ICL=29
cluster 7 has 3 sub-cluster at 3, ICL=26
cluster 8 has 5 sub-cluster at 6, ICL=19
cluster 11 has 2 sub-cluster at 2, ICL=18
cluster 5 has 2 sub-cluster at 3, ICL=15
cluster 9 has 2 sub-cluster at 2, ICL=12
cluster 10 has 6 sub-cluster at 7, ICL=5.6
cluster 6 has 5 sub-cluster at 5, ICL=0.99
cluster 1 has 3 sub-cluster at 3, ICL=0
Model Refinement takes 0.13 mins minutes
Fit Model 8 of (6/10) K=12->12 N=9682
EM takes 0 mins minutes
Process completed at8
Major Iteration (Trans) (6/4) takes 0.52 mins minutes
> summary(res)
** Experiment Information **
Experiment name: immunoClust Experiment
Data Filename: fcs/12443.fcs
Parameters: FSC-A SSC-A FITC-A PE-A APC-A APC-Cy7-A Pacific Blue-A
Description: NA NA CD14 CD19 CD15 CD4 CD3
** Data Information **
Number of observations: 10000
Number of parameters: 7
Removed from above: 318 (3.18%)
Removed from below: 0 (0%)
** Transformation Information **
htrans-A: 0.000000 0.000000 0.010000 0.010000 0.010000 0.010000 0.010000
htrans-B: 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
htrans-decade: -1
** Clustering Summary **
Number of clusters: 12
Cluster Proportion Observations
1 0.091851 890
2 0.005914 57
3 0.040373 391
4 0.084059 827
5 0.033547 312
6 0.028571 281
7 0.012627 122
8 0.007323 70
9 0.034162 333
10 0.015835 154
11 0.008024 79
12 0.637713 6166
Min. 0.005914 57
Max. 0.637713 6166
** Information Criteria **
Log likelihood: -254005 -254201.8 -173041.3
BIC: -254005
ICL: -254201.8
>
>
>
>
>
> dev.off()
null device
1
>