Last data update: 2014.03.03

R: Calculate normalization factors
calcNormFactorsR Documentation

Calculate normalization factors

Description

This function calculates normalization factors using a specified multi-step normalization method from a TCC-class object. The procedure can generally be described as the STEP1-(STEP2-STEP3)n pipeline.

Usage

## S4 method for signature 'TCC'
calcNormFactors(tcc, norm.method = NULL, test.method = NULL,
                iteration = TRUE,  FDR = NULL, floorPDEG = NULL, 
                increment = FALSE, ...)

Arguments

tcc

TCC-class object.

norm.method

character specifying a normalization method used in both the STEP1 and STEP3. Possible values are "tmm" for the TMM normalization method implemented in the edgeR package, "edger" (same as "tmm"), "deseq2" and "deseq" for the method implemented in the DESeq package. The default is "tmm" when analyzing the count data with multiple replicates (i.e., min(table(tcc$group[, 1])) > 1) and "deseq" when analyzing the count data without replicates
(i.e., min(table(tcc$group[, 1])) == 1).

test.method

character specifying a method for identifying differentially expressed genes (DEGs) used in STEP2: one of "edger", "deseq", "deseq2", "bayseq", "samseq", "voom" and "wad". See the "Details" filed in estimateDE for detail. The default is "edger" when analyzing the count data with multiple replicates (i.e., min(table(tcc$group[, 1])) > 1), and "deseq" (2 group) and "deseq2" (more than 2 group) when analyzing the count data without replicates (i.e., min(table(tcc$group[, 1])) == 1.)

iteration

logical or numeric value specifying the number of iteration (n) in the proposed normalization pipeline: the STEP1-(STEP2-STEP3)n pipeline. If FALSE or 0 is specified, the normalization pipeline is performed only by the method in STEP1. If TRUE or 1 is specified, the three-step normalization pipeline is performed. Integers higher than 1 indicate the number of iteration in the pipeline.

FDR

numeric value (between 0 and 1) specifying the threshold for determining potential DEGs after STEP2.

floorPDEG

numeric value (between 0 and 1) specifying the minimum value to be eliminated as potential DEGs before performing STEP3.

increment

logical value. if increment = TRUE, the DEGES pipeline will perform again from the current iterated result.

...

arguments to identify potential DEGs at STEP2. See the "Arguments" field in estimateDE for details.

Details

The calcNormFactors function is the main function in the TCC package. Since this pipeline employs the DEG identification method at STEP2, our multi-step strategy can eliminate the negative effect of potential DEGs before the second normalization at STEP3. To fully utilize the DEG elimination strategy (DEGES), we strongly recommend not to use iteration = 0 or iteration = FALSE. This function internally calls functions implemented in other R packages according to the specified value.

  • norm.method = "tmm"
    The calcNormFactors function implemented in edgeR is used for obtaining the TMM normalization factors at both STEP1 and STEP3.

  • norm.method = "deseq2"
    The estimateSizeFactors function implemented in DESeq2 is used for obetaining the size factors at both STEP1 and STEP3. The size factors are internally converted to normalization factors that are comparable to the TMM normalization factors.

  • norm.method = "deseq"
    The estimateSizeFactors function implemented in DESeq is used for obetaining the size factors at both STEP1 and STEP3. The size factors are internally converted to normalization factors that are comparable to the TMM normalization factors.

Value

After performing the calcNormFactors function, the calculated normalization factors are populated in the norm.factors field (i.e., tcc$norm.factors). Parameters used for DEGES normalization (e.g., potential DEGs identified in STEP2, execution times for the identification, etc.) are stored in the DEGES field (i.e., tcc$DEGES) as follows:

iteration

the iteration number n for the STEP1 - (STEP2 - STEP3)_{n} pipeline.

pipeline

the DEGES normalization pipeline.

threshold

it stores (i) the type of threshold (threshold$type), (ii) the threshold value (threshold$input), and (iii) the percentage of potential DEGs actually used (threshold$PDEG). These values depend on whether the percentage of DEGs identified in STEP2 is higher or lower to the value indicated by floorPDEG. Consider, for example, the execution of calcNormFactors function with "FDR = 0.1 and floorPDEG = 0.05". If the percentage of DEGs identified in STEP2 satisfying FDR = 0.1 was 0.14 (i.e., higher than the floorPDEG of 0.05), the values in the threshold fields will be threshold$type = "FDR", threshold$input = 0.1, and threshold$PDEG = 0.14. If the percentage (= 0.03) was lower than the predefined floorPDEG value of 0.05, the values in the threshold fields will be threshold$type = "floorPDEG", threshold$input = 0.05, and threshold$PDEG = 0.05.

potDEG

numeric binary vector (0 for non-DEG or 1 for DEG) after the evaluation of the percentage of DEGs identified in STEP2 with the predefined floorPDEG value. If the percentage (e.g., 2%) is lower than the floorPDEG value (e.g., 17%), 17% of elements become 1 as DEG.

prePotDEG

numeric binary vector (0 for non-DEG or 1 for DEG) before the evaluation of the percentage of DEGs identified in STEP2 with the predefined floorPDEG value. Regardless of the floorPDEG value, the percentage of elements with 1 is always the same as that of DEGs identified in STEP2.

execution.time

computation time required for normalization.

Examples

data(hypoData)
group <- c(1, 1, 1, 2, 2, 2)

# Calculating normalization factors using the DEGES/edgeR method 
# (the TMM-edgeR-TMM pipeline).
tcc <- new("TCC", hypoData, group)
tcc <- calcNormFactors(tcc, norm.method = "tmm", test.method = "edger",
                       iteration = 1, FDR = 0.1, floorPDEG = 0.05)
tcc$norm.factors

# Calculating normalization factors using the iterative DEGES/edgeR method 
# (iDEGES/edgeR) with n = 3.
tcc <- new("TCC", hypoData, group)
tcc <- calcNormFactors(tcc, norm.method = "tmm", test.method = "edger",
                       iteration = 3, FDR = 0.1, floorPDEG = 0.05)
tcc$norm.factors

# Calculating normalization factors for simulation data without replicates.
tcc <- simulateReadCounts(replicates = c(1, 1))
tcc <- calcNormFactors(tcc, norm.method = "deseq", test.method = "deseq",
                       iteration = 1, FDR = 0.1, floorPDEG = 0.05)
tcc$norm.factors

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(TCC)
Loading required package: DESeq
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: locfit
locfit 1.5-9.1 	 2013-03-22
Loading required package: lattice
    Welcome to 'DESeq'. For improved performance, usability and
    functionality, please consider migrating to 'DESeq2'.
Loading required package: DESeq2
Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: SummarizedExperiment

Attaching package: 'DESeq2'

The following objects are masked from 'package:DESeq':

    estimateSizeFactorsForMatrix, getVarianceStabilizedData,
    varianceStabilizingTransformation

Loading required package: edgeR
Loading required package: limma

Attaching package: 'limma'

The following object is masked from 'package:DESeq2':

    plotMA

The following object is masked from 'package:DESeq':

    plotMA

The following object is masked from 'package:BiocGenerics':

    plotMA

Loading required package: baySeq
Loading required package: abind
Loading required package: perm
Loading required package: ROC

Attaching package: 'TCC'

The following object is masked from 'package:edgeR':

    calcNormFactors

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/TCC/calcNormFactors.Rd_%03d_medium.png", width=480, height=480)
> ### Name: calcNormFactors
> ### Title: Calculate normalization factors
> ### Aliases: calcNormFactors,TCC-method calcNormFactors,DGEList-method
> ###   calcNormFactors
> 
> ### ** Examples
> 
> data(hypoData)
> group <- c(1, 1, 1, 2, 2, 2)
> 
> # Calculating normalization factors using the DEGES/edgeR method 
> # (the TMM-edgeR-TMM pipeline).
> tcc <- new("TCC", hypoData, group)
> tcc <- calcNormFactors(tcc, norm.method = "tmm", test.method = "edger",
+                        iteration = 1, FDR = 0.1, floorPDEG = 0.05)
TCC::INFO: Calculating normalization factors using DEGES
TCC::INFO: (iDEGES pipeline : tmm - [ edger - tmm ] X 1 )
TCC::INFO: Done.
> tcc$norm.factors
  G1_rep1   G1_rep2   G1_rep3   G2_rep1   G2_rep2   G2_rep3 
0.8756470 0.8440551 0.8412683 1.0811889 1.1520110 1.2058296 
> 
> # Calculating normalization factors using the iterative DEGES/edgeR method 
> # (iDEGES/edgeR) with n = 3.
> tcc <- new("TCC", hypoData, group)
> tcc <- calcNormFactors(tcc, norm.method = "tmm", test.method = "edger",
+                        iteration = 3, FDR = 0.1, floorPDEG = 0.05)
TCC::INFO: Calculating normalization factors using DEGES
TCC::INFO: (iDEGES pipeline : tmm - [ edger - tmm ] X 3 )
TCC::INFO: Done.
> tcc$norm.factors
  G1_rep1   G1_rep2   G1_rep3   G2_rep1   G2_rep2   G2_rep3 
0.8766053 0.8450605 0.8346595 1.0842097 1.1538160 1.2056491 
> 
> # Calculating normalization factors for simulation data without replicates.
> tcc <- simulateReadCounts(replicates = c(1, 1))
TCC::INFO: Generating simulation data under NB distribution ...
TCC::INFO: (genesizes   :  10000 )
TCC::INFO: (replicates  :  1, 1 )
TCC::INFO: (PDEG        :  0.18, 0.02 )
> tcc <- calcNormFactors(tcc, norm.method = "deseq", test.method = "deseq",
+                        iteration = 1, FDR = 0.1, floorPDEG = 0.05)
TCC::INFO: Calculating normalization factors using DEGES
TCC::INFO: (iDEGES pipeline : deseq - [ deseq - deseq ] X 1 )
TCC::INFO: Done.
> tcc$norm.factors
  G1_rep1   G2_rep1 
0.8627514 1.1372486 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>