Last data update: 2014.03.03

R: Compute normalized expression values
Normalized ExpressionR Documentation

Compute normalized expression values

Description

Compute (log-)expression values from read counts, using pre-calculated size factors for normalization.

Usage

## S4 method for signature 'SCESet'
normalize(object, log=TRUE, prior.count=1, separate.spikes=TRUE)

Arguments

object

A matrix of read counts, or a SCESet object with an assay named "counts".

log

A logical scalar specifying whether the expression should be log-transformed.

prior.count

A numeric scalar indicating the prior count to add prior to log-transformation, to avoid undefined values from zero counts.

separate.spikes

A logical scalar indicating whether spike-in counts should be normalized separately.

Details

This function computes normalized log-expression values by adding prior.count to each count, dividing by the size.factor for that cell, and log-transforming. Size factors are taken from the appropriate field in the colData of object. These size factors can be computed with a number of functions like computeSumFactors or computeSpikeFactors.

If spike-in counts are present in the SCESet object, these will also be converted into normalized values. If separate.spikes=FALSE, this is done with the same set of size factors that was used for the endogenous genes. Otherwise, a separate set of spike-in size factors will be used instead – these are defined by calling computeSpikeFactors.

All size factors are mean-centered so that their geometric mean is equal to unity prior to computing normalized expression values. This ensures that expression values are roughly comparable when different sets of size factors are used.

Value

For normalize,SCESet-method, a SCESet object is returned with an additional assay named "exprs". This contains normalized log-expression values for the endogenous genes. If spike-ins are present, normalized values for the spike-in transcripts are stored in the norm.spikes field of the colData.

Why spikes can be normalized separately

In most cases, it does not make sense to normalize spike-in counts with size factors computed from endogenous genes. This is because the spike-in counts do not (generally) depend on the total amount of endogenous RNA, whereas the size factors do. As such, normalizing the former with the latter would be inappropriate – cells with a lot of endogenous RNA would scale down the spike-in counts, even if the same amount of spike-in RNA was added, captured and sequenced in each cell.

Instead, normalization of the spike-in counts should be performed using size factors computed from those counts, i.e., with computeSpikeFactors. This is the default setting when separate.spikes=TRUE. Normalized log-expression values are made to be roughly comparable to those of endogenous genes, by ensuring all sets of size factors are mean-centered prior to normalization.

Author(s)

Aaron Lun

See Also

cpm, SCESet, computeSumFactors, computeSpikeFactors

Examples

set.seed(100)
popsize <- 10
ngenes <- 1000
all.facs <- 2^rnorm(popsize, sd=0.5)
counts <- matrix(rnbinom(ngenes*popsize, mu=10*all.facs, size=1), ncol=popsize, byrow=TRUE)
spikes <- matrix(rnbinom(100*popsize, mu=10*all.facs, size=0.5), ncol=popsize, byrow=TRUE)

combined <- rbind(counts, spikes)
colnames(combined) <- seq_len(popsize)
rownames(combined) <- seq_len(nrow(combined))
y <- newSCESet(countData=combined)
isSpike(y) <- rep(c(FALSE, TRUE), c(ngenes, 100))

sizeFactors(y) <- colSums(combined) # Library size normalization, basically.
y <- normalize(y)
exprs(y)[1:10,]

y <- computeSpikeFactors(y)
y <- normalize(y)
exprs(y)[1:10,]

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(scran)
Loading required package: BiocParallel
Loading required package: scater
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: ggplot2

Attaching package: 'scater'

The following object is masked from 'package:stats':

    filter

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/scran/normalize.Rd_%03d_medium.png", width=480, height=480)
> ### Name: Normalized Expression
> ### Title: Compute normalized expression values
> ### Aliases: normalize normalize,SCESet-method
> 
> ### ** Examples
> 
> set.seed(100)
> popsize <- 10
> ngenes <- 1000
> all.facs <- 2^rnorm(popsize, sd=0.5)
> counts <- matrix(rnbinom(ngenes*popsize, mu=10*all.facs, size=1), ncol=popsize, byrow=TRUE)
> spikes <- matrix(rnbinom(100*popsize, mu=10*all.facs, size=0.5), ncol=popsize, byrow=TRUE)
> 
> combined <- rbind(counts, spikes)
> colnames(combined) <- seq_len(popsize)
> rownames(combined) <- seq_len(nrow(combined))
> y <- newSCESet(countData=combined)
> isSpike(y) <- rep(c(FALSE, TRUE), c(ngenes, 100))
> 
> sizeFactors(y) <- colSums(combined) # Library size normalization, basically.
> y <- normalize(y)
> exprs(y)[1:10,]
             1        2           3        4           5           6
1   2.75701479 3.091186  2.81704196 4.857246  4.26626502 -0.02079825
2   2.48674159 2.249089  0.99706303 3.096452  3.40528739  3.91398142
3   3.50834992 3.091186  0.99706303 5.413006  0.96095692  2.91842238
4   4.95475038 2.509914  1.58793048 3.218761  4.03212750  0.94478483
5   2.75701479 3.242450  4.10010425 2.962805  4.69874282  4.43618906
6   1.09797602 1.520928 -0.02079825 4.519244  3.75248764  3.23857573
7   3.89180245 4.616861  4.53674220 3.218761  1.95174724  2.72704364
8  -0.02079825 2.509914  3.01031535 2.815502 -0.02079825  3.08736168
9   1.09797602 1.930469  2.00591142 2.815502  4.69874282  0.94478483
10  4.84308935 3.619425  3.18072884 1.703065  5.19138438  3.82116850
             7           8           9          10
1   3.81109651  3.97035784 -0.02079825 -0.02079825
2   3.54392973  2.02789447  1.84565383 -0.02079825
3   6.26522777  0.81458199  3.93850003  3.72634945
4   2.51888377  0.81458199  4.36101161  4.14486913
5   2.18426773  4.34434704  3.66945836  2.11322270
6  -0.02079825 -0.02079825  1.19529882  3.95073271
7   2.51888377  3.56182779  2.29230865  2.11322270
8   1.11909822  4.72781323  3.13932208  4.73330543
9   1.11909822  1.72464639  3.13932208  3.46052323
10  2.51888377  1.34021461  2.90821039 -0.02079825
> 
> y <- computeSpikeFactors(y)
> y <- normalize(y)
> exprs(y)[1:10,]
              1        2            3        4            5            6
1   2.580822104 3.047001  2.833127874 4.991962  4.348731244 -0.008813928
2   2.317601047 2.210830  1.011462384 3.221240  3.484719000  4.120469089
3   3.318160928 3.047001  1.011462384 5.549045  1.010041461  3.112201473
4   4.751472741 2.469441  1.603120818 3.344697  4.113939195  1.061575400
5   2.580822104 3.197519  4.116582456 3.086220  4.782170625  4.646537901
6   0.993992007 1.491424 -0.008813928 4.652866  3.833364296  3.437442302
7   3.696801723 4.567778  4.553291863 3.344697  2.019377542  2.917185491
8  -0.008813928 2.469441  3.026484767 2.937248 -0.008813928  3.283967821
9   0.993992007 1.895512  2.021494666 2.937248  4.782170625  1.061575400
10  4.640424846 3.572942  3.196963160 1.804673  5.275607485  4.026809982
              7            8            9           10
1   3.691637243  4.090937245 -0.008813928 -0.008813928
2   3.426583477  2.128373273  1.754316086 -0.008813928
3   6.137246203  0.878523268  3.815837464  3.668491891
4   2.414407597  0.878523268  4.235885260  4.085560915
5   2.086352809  4.466528532  3.548782254  2.067173380
6  -0.008813928 -0.008813928  1.126831745  3.892045534
7   2.414407597  3.680107599  2.190153575  2.067173380
8   1.055322883  4.851257053  3.023818120  4.672552083
9   1.055322883  1.818711053  3.023818120  3.403830203
10  2.414407597  1.423887740  2.795620440 -0.008813928
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>