Last data update: 2014.03.03

R: Cell cycle phase training
sandbagR Documentation

Cell cycle phase training

Description

Use gene expression data to train a classifier for cell cycle phase.

Usage

## S4 method for signature 'matrix'
sandbag(x, is.G1, is.S, is.G2M, gene.names=rownames(x), fraction=0.5)
## S4 method for signature 'SCESet'
sandbag(x, ..., assay="counts", get.spikes=FALSE)

Arguments

x

A numeric matrix of gene expression values where rows are genes and columns are cells. Alternatively, a SCESet object containing such a matrix.

is.G1, is.S, is.G2M

A vector indicating which cells are in each phase of the cell cycle.

gene.names

A character vector of gene names.

fraction

A numeric scalar specifying the minimum fraction to define a marker gene pair.

...

Additional arguments to pass to sandbag,matrix-method.

assay

A string specifying which assay values to use, e.g., counts or exprs.

get.spikes

A logical specifying whether spike-in transcripts should be used.

Details

This function implements the training step of the pair-based prediction method described by Scialdone et al. (2015). Pairs of genes (A, B) are identified from a training data set where in each pair, the fraction of cells in phase G1 with expression of A > B (based on expression values in training.data) and the fraction with B > A in each other phase exceeds fraction. These pairs are defined as the marker pairs for G1. This is repeated for each phase to obtain a separate marker pair set.

Pre-defined sets of marker pairs are provided for mouse (see Examples). Classification from test data can be performed using the cyclone function. For each cell, this involves comparing expression values between genes in each marker pair. The cell is then assigned to the phase that is consistent with the direction of the difference in expression in the majority of pairs.

For sandbag,SCESet-method, the matrix of counts is used but can be replaced with expression values by setting assays. By default, get.spikes=FALSE which means that any rows corresponding to spike-in transcripts will not be considered when picking markers. This is because the amount of spike-in RNA added will vary between experiments, such that the relative expression of genes to spike-ins will not be a reliable predictor. Nonetheless, if all rows are required, users can set get.spikes=TRUE.

Value

A named list of data.frames, where each data frame corresponds to a cell cycle phase and contains the names of the genes in each marker pair.

Author(s)

Antonio Scialdone, with modifications by Aaron Lun

References

Scialdone A, Natarajana KN, Saraiva LR et al. (2015). Computational assignment of cell-cycle stage from single-cell transcriptome data. Methods 85:54–61

See Also

cyclone

Examples

ncells <- 50
ngenes <- 20
training <- matrix(rnorm(ncells*ngenes), ncol=ncells)
rownames(training) <- paste0("X", seq_len(ngenes))

is.G1 <- 1:20
is.S <- 21:30
is.G2M <- 31:50
out <- sandbag(training, is.G1, is.S, is.G2M) 

# Getting pre-trained marker sets
mm.pairs <- readRDS(system.file("exdata", "mouse_cycle_markers.rds", package="scran"))
hs.pairs <- readRDS(system.file("exdata", "human_cycle_markers.rds", package="scran"))

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(scran)
Loading required package: BiocParallel
Loading required package: scater
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: ggplot2

Attaching package: 'scater'

The following object is masked from 'package:stats':

    filter

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/scran/sandbag.Rd_%03d_medium.png", width=480, height=480)
> ### Name: sandbag
> ### Title: Cell cycle phase training
> ### Aliases: sandbag sandbag,matrix-method sandbag,SCESet-method
> ### Keywords: clustering
> 
> ### ** Examples
> 
> ncells <- 50
> ngenes <- 20
> training <- matrix(rnorm(ncells*ngenes), ncol=ncells)
> rownames(training) <- paste0("X", seq_len(ngenes))
> 
> is.G1 <- 1:20
> is.S <- 21:30
> is.G2M <- 31:50
> out <- sandbag(training, is.G1, is.S, is.G2M) 
> 
> # Getting pre-trained marker sets
> mm.pairs <- readRDS(system.file("exdata", "mouse_cycle_markers.rds", package="scran"))
> hs.pairs <- readRDS(system.file("exdata", "human_cycle_markers.rds", package="scran"))
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>