This function calculates normalization factors using a specified
multi-step normalization method from a TCC-class object.
The procedure can generally be described as the
STEP1-(STEP2-STEP3)n pipeline.
character specifying a normalization method used in
both the STEP1 and STEP3. Possible values are
"tmm" for the TMM normalization method implemented in the
edgeR package, "edger" (same as "tmm"), "deseq2"
and "deseq" for the method implemented in the DESeq package.
The default is "tmm"
when analyzing the count data with multiple replicates
(i.e., min(table(tcc$group[, 1])) > 1)
and "deseq" when analyzing the count data without replicates
(i.e., min(table(tcc$group[, 1])) == 1).
test.method
character specifying a method for identifying
differentially expressed genes (DEGs) used in STEP2:
one of "edger", "deseq", "deseq2",
"bayseq", "samseq", "voom" and "wad".
See the "Details" filed in estimateDE for detail.
The default is "edger" when analyzing the count data with
multiple replicates (i.e., min(table(tcc$group[, 1])) > 1),
and "deseq" (2 group) and "deseq2" (more than 2 group)
when analyzing the count data without replicates
(i.e., min(table(tcc$group[, 1])) == 1.)
iteration
logical or numeric value specifying the number of
iteration (n) in the proposed normalization pipeline: the
STEP1-(STEP2-STEP3)n pipeline.
If FALSE or 0 is specified, the normalization pipeline
is performed only by the method in STEP1.
If TRUE or 1
is specified, the three-step normalization pipeline is performed.
Integers higher than 1 indicate the number of iteration in
the pipeline.
FDR
numeric value (between 0 and 1) specifying the threshold for
determining potential DEGs after STEP2.
floorPDEG
numeric value (between 0 and 1) specifying the minimum
value to be eliminated as potential DEGs before performing
STEP3.
increment
logical value. if increment = TRUE, the DEGES
pipeline will perform again from the current iterated result.
...
arguments to identify potential DEGs at STEP2. See the
"Arguments" field in estimateDE for details.
Details
The calcNormFactors function is the main function in the
TCC package.
Since this pipeline employs the DEG identification method at STEP2,
our multi-step strategy can eliminate the negative effect of potential DEGs
before the second normalization at STEP3.
To fully utilize the DEG elimination strategy (DEGES), we strongly recommend
not to use iteration = 0 or iteration = FALSE.
This function internally calls functions implemented in other R packages
according to the specified value.
norm.method = "tmm"
The calcNormFactors function implemented
in edgeR is used for obtaining the TMM normalization factors
at both STEP1 and STEP3.
norm.method = "deseq2"
The estimateSizeFactors function
implemented in DESeq2 is used for obetaining the size factors
at both STEP1 and STEP3.
The size factors are internally converted to normalization factors
that are comparable to the TMM normalization factors.
norm.method = "deseq"
The estimateSizeFactors function
implemented in DESeq is used for obetaining the size factors
at both STEP1 and STEP3.
The size factors are internally converted to normalization factors
that are comparable to the TMM normalization factors.
Value
After performing the calcNormFactors function,
the calculated normalization factors are populated in the
norm.factors field (i.e., tcc$norm.factors).
Parameters used for DEGES normalization (e.g., potential DEGs
identified in STEP2, execution times for the identification, etc.)
are stored in the DEGES field (i.e., tcc$DEGES) as follows:
iteration
the iteration number n for
the STEP1 - (STEP2 - STEP3)_{n} pipeline.
pipeline
the DEGES normalization pipeline.
threshold
it stores
(i) the type of threshold (threshold$type),
(ii) the threshold value (threshold$input),
and (iii) the percentage of potential DEGs actually
used (threshold$PDEG).
These values depend on whether the percentage
of DEGs identified in STEP2 is higher or lower to the value
indicated by floorPDEG.
Consider, for example, the execution of calcNormFactors
function with "FDR = 0.1 and floorPDEG = 0.05".
If the percentage of DEGs identified in STEP2 satisfying
FDR = 0.1 was 0.14
(i.e., higher than the floorPDEG of 0.05),
the values in the threshold fields will be
threshold$type = "FDR", threshold$input = 0.1,
and threshold$PDEG = 0.14.
If the percentage (= 0.03) was lower than the predefined
floorPDEG value of 0.05, the values in the threshold fields
will be threshold$type = "floorPDEG",
threshold$input = 0.05, and threshold$PDEG = 0.05.
potDEG
numeric binary vector (0 for non-DEG or 1 for DEG)
after the evaluation of the percentage of DEGs identified in
STEP2 with the predefined floorPDEG value.
If the percentage (e.g., 2%) is lower than the floorPDEG
value (e.g., 17%), 17% of elements become 1 as DEG.
prePotDEG
numeric binary vector
(0 for non-DEG or 1 for DEG) before the evaluation of the percentage
of DEGs identified in STEP2 with the predefined
floorPDEG value. Regardless of the floorPDEG value,
the percentage of elements with 1 is always the same as that of DEGs
identified in STEP2.