Last data update: 2014.03.03

R: Quick clustering of cells
Quick clusteringR Documentation

Quick clustering of cells

Description

Cluster similar cells based on rank correlations in their gene expression profiles.

Usage

## S4 method for signature 'matrix'
quickCluster(x, min.size=200, ...)
## S4 method for signature 'SCESet'
quickCluster(x, ..., assay="counts", get.spikes=FALSE)

Arguments

x

A numeric count matrix where rows are genes and columns are cells. Alternatively, a SCESet object containing such a matrix.

min.size

An integer scalar specifying the minimum size of each cluster.

...

For quickCluster,matrix-method, additional arguments to be passed to cutreeDynamic. For quickCluster,SCESet-method, additional arguments to pass to quickCluster,matrix-method.

assay

A string specifying which assay values to use, e.g., counts or exprs.

get.spikes

A logical specifying whether spike-in transcripts should be used.

Details

This function provides a correlation-based approach to quickly define clusters of a minimum size min.size. A distance matrix is constructed using Spearman's correlation on the counts between cells. Hierarchical clustering is performed and a dynamic tree cut is used to define clusters of cells. A correlation-based approach is preferred here as it is invariant to scaling normalization. This avoids circularity between normalization and clustering.

Note that some cells may not be assigned to any cluster. In most cases, this is because those cells belong in a separate cluster with fewer than min.size cells. The function will not be able to call this as a cluster as the minimum threshold on the number of cells has not been passed. Users are advised to check that the unassigned cells do indeed form their own cluster. If so, it is generally safe to ignore this warning and to treat all unassigned cells as a single cluster. Otherwise, it may be necessary to use a custom clustering algorithm.

By default, spike-in transcripts are not used as they provide little information on the biological similarities between cells. This may not be the case if subpopulations differ by total RNA content, in which case setting get.spikes=TRUE may provide more discriminative power.

Value

A vector of cluster identities for each cell in counts. Values of "0" are used to indicate cells that are not assigned to any cluster.

Author(s)

Aaron Lun and Karsten Bach

See Also

cutreeDynamic, computeSumFactors

Examples

set.seed(100)
popsize <- 200
ngenes <- 10000
all.facs <- 2^rnorm(popsize, sd=0.5)
counts <- matrix(rnbinom(ngenes*popsize, mu=all.facs, size=1), ncol=popsize, byrow=TRUE)
clusters <- quickCluster(counts, min.size=20)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(scran)
Loading required package: BiocParallel
Loading required package: scater
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: ggplot2

Attaching package: 'scater'

The following object is masked from 'package:stats':

    filter

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/scran/quickCluster.Rd_%03d_medium.png", width=480, height=480)
> ### Name: Quick clustering
> ### Title: Quick clustering of cells
> ### Aliases: quickCluster quickCluster,matrix-method
> ###   quickCluster,SCESet-method
> ### Keywords: normalization
> 
> ### ** Examples
> 
> set.seed(100)
> popsize <- 200
> ngenes <- 10000
> all.facs <- 2^rnorm(popsize, sd=0.5)
> counts <- matrix(rnbinom(ngenes*popsize, mu=all.facs, size=1), ncol=popsize, byrow=TRUE)
> clusters <- quickCluster(counts, min.size=20)
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>