Last data update: 2014.03.03

R: Cell cycle phase classification
cycloneR Documentation

Cell cycle phase classification

Description

Classify single cells into their cell cycle phases based on gene expression data.

Usage

## S4 method for signature 'matrix'
cyclone(x, pairs, gene.names=rownames(x), iter=1000, min.iter=100, min.pairs=50, 
    BPPARAM=bpparam(), verbose=FALSE)
## S4 method for signature 'SCESet'
cyclone(x, ..., assay="counts", get.spikes=FALSE)

Arguments

x

A numeric matrix of gene expression values where rows are genes and columns are cells. Alternatively, a SCESet object containing such a matrix.

pairs

A list of data.frames produced by sandbag, containing pairs of marker genes.

gene.names

A character vector of gene names.

iter

An integer scalar specifying the number of iterations for random sampling to obtain a cycle score.

min.iter

An integer scalar specifying the minimum number of iterations for score estimation.

min.pairs

An integer scalar specifying the minimum number of pairs for cycle estimation.

BPPARAM

A BiocParallelParam object to use in bplapply for parallel processing.

verbose

A logical scalar specifying whether diagnostics should be printed to screen.

...

Additional arguments to pass to cyclone,matrix-method.

assay

A string specifying which assay values to use, e.g., counts or exprs.

get.spikes

A logical specifying whether spike-in transcripts should be used.

Details

This function implements the classification step of the pair-based prediction method described by Scialdone et al. (2015). Pairs of marker genes are trained with sandbag, where the sign of the relative expression between gene in each pair changes across phases. For each phase and each cell, the function calculates the proportion of all marker pairs where the expression of the first gene is greater than the second (pairs with the same expression are ignored). A distribution of proportions is constructed by shuffling the expression values within the cell and recalculating the proportion at each iteration. The phase score for that cell is then defined as the lower tail probability of this distribution.

By default, shuffling is performed iter times to obtain the distribution from which the score is estimated. However, some iterations may not be used if there are fewer than min.pairs pairs with different expression, such that the proportion cannot be calculated precisely. Also, a score is only returned if the distribution is large enough for stable calculation of the tail probability, i.e., consists of results from at least min.iter iterations.

Cells with G1 and G2M scores above 0.5 should be assigned to the G1 and G2M phases, respectively. This is based on the interpretation of the score as 1 minus the p-value for the null distribution of proportions. The null hypothesis here is that expression of the marker genes is independent within each cell, i.e., with no cycle-induced correlations between marker pairs. Cells can be assigned to S phase based on the S phase score, but a more reliable approach is to define S phase cells based on those cells with G1 and G2M scores below 0.5.

For cyclone,SCESet-method, the matrix of counts is used but can be replaced with expression values by setting assays. By default, get.spikes=FALSE which means that any rows corresponding to spike-in transcripts will not be considered for score calculation. This is for the same reasons as described in ?sandbag.

Value

A list of two data frames is returned – scores, containing the phase scores for each phase and cell (i.e., each row is a cell); and normalized.scores, containing the row-normalized scores (i.e., where the row sum for each cell is equal to 1).

Author(s)

Antonio Scialdone, with modifications by Aaron Lun

References

Scialdone A, Natarajana KN, Saraiva LR et al. (2015). Computational assignment of cell-cycle stage from single-cell transcriptome data. Methods 85:54–61

See Also

sandbag

Examples

example(sandbag)

# Classifying (note: test.data!=training.data in real cases)
test <- training 
assignments <- cyclone(test, out)

# Visualizing
col <- character(ncells)
col[is.G1] <- "red"
col[is.G2M] <- "blue"
col[is.S] <- "darkgreen"
plot(assignments$score$G1, assignments$score$G2M, col=col, pch=16)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(scran)
Loading required package: BiocParallel
Loading required package: scater
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: ggplot2

Attaching package: 'scater'

The following object is masked from 'package:stats':

    filter

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/scran/cyclone.Rd_%03d_medium.png", width=480, height=480)
> ### Name: cyclone
> ### Title: Cell cycle phase classification
> ### Aliases: cyclone cyclone,matrix-method cyclone,SCESet-method
> ### Keywords: clustering
> 
> ### ** Examples
> 
> example(sandbag)

sandbg> ncells <- 50

sandbg> ngenes <- 20

sandbg> training <- matrix(rnorm(ncells*ngenes), ncol=ncells)

sandbg> rownames(training) <- paste0("X", seq_len(ngenes))

sandbg> is.G1 <- 1:20

sandbg> is.S <- 21:30

sandbg> is.G2M <- 31:50

sandbg> out <- sandbag(training, is.G1, is.S, is.G2M) 

sandbg> # Getting pre-trained marker sets
sandbg> mm.pairs <- readRDS(system.file("exdata", "mouse_cycle_markers.rds", package="scran"))

sandbg> hs.pairs <- readRDS(system.file("exdata", "human_cycle_markers.rds", package="scran"))
> 
> # Classifying (note: test.data!=training.data in real cases)
> test <- training 
> assignments <- cyclone(test, out)
> 
> # Visualizing
> col <- character(ncells)
> col[is.G1] <- "red"
> col[is.G2M] <- "blue"
> col[is.S] <- "darkgreen"
> plot(assignments$score$G1, assignments$score$G2M, col=col, pch=16)
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>