Last data update: 2014.03.03

R: Estimate Size Factors
estimateJunctionSeqSizeFactorsR Documentation

Estimate Size Factors

Description

Estimate size factors, which are scaling factors used as "offsets" by the statistical model to make the different samples comparable. This is necessary because the different samples may have been sequenced to slightly different depths. Additionally, the presence of differentially expressed genes may cause the apparent depth of many genes to appear different.

This function uses the "geometric" size factor normalization method, which is identical to the one used by DESeq, DESeq2, DEXSeq, and the default method used by CuffDiff.

This function is called internally by the runJunctionSeqAnalyses function, and thus for most purposes users should not need to call this function directly. It may be useful to advanced users performing non-standard analyses.

Usage

  estimateJunctionSeqSizeFactors(jscs, 
          method.sizeFactors = c("byGenes","byCountbins"), 
          replicateDEXSeqBehavior.useRawBaseMean = FALSE, 
          calcAltSF = TRUE, 
          verbose = FALSE);
  
  writeSizeFactors(jscs, file);

Arguments

jscs

A JunctionSeqCountSet. Usually initially created by readJunctionSeqCounts. Size factors must be set, usually using functions estimateSizeFactors and estimateJunctionSeqDispersions.

method.sizeFactors

Character string. Can be used to apply alternative methodologies or implementations. Intended for advanced users who have strong opinions about the underlying statistical methodologies.

Determines the method used to calculate normalization size factors. By default JunctionSeq uses gene-level expression. As an alternative, feature-level counts can be used as they are in DEXSeq. In practice the difference is almost always negligible.

replicateDEXSeqBehavior.useRawBaseMean

USED ONLY FOR INTERNAL TESTING! NOT INTENDED FOR ACTUAL USE!

This variable activates an alternative mode in which a (very minor) bug in DEXSeq v1.14.0 and earlier is replicated. If TRUE, the baseMean and baseVar variables will be computed using raw counts rather than normalized counts. This is used for internal tests in which DEXSeq functionality is replicated precisely and the results are compared against equivalent DEXSeq results. Without this option the results would differ slightly (generally by less than 1 hundreth of a percent).

USED ONLY FOR INTERNAL TESTING! NOT INTENDED FOR ACTUAL USE!

calcAltSF

Logical. Determines whether both types of size factor calculations should be generated, and placed in the jscs@altSizeFactors slot.

verbose

if TRUE, send debugging and progress messages to the console / stdout.

file

A file path to write the size factor table.

...

If using the (depreciated) estimateSizeFactors command, use the same syntax as above.

Value

A JunctionSeqCountSet, with size factors included.

Examples


data(exampleDataSet,package="JctSeqData");
jscs <- estimateJunctionSeqSizeFactors(jscs);

## Not run: 
########################################
#Set up example data:
decoder.file <- system.file(
                  "extdata/annoFiles/decoder.bySample.txt",
                  package="JctSeqData");
decoder <- read.table(decoder.file,
                  header=TRUE,
                  stringsAsFactors=FALSE);
gff.file <- system.file(
            "extdata/cts/withNovel.forJunctionSeq.gff.gz",
            package="JctSeqData");
countFiles <- system.file(paste0("extdata/cts/",
     decoder$sample.ID,
     "/QC.spliceJunctionAndExonCounts.withNovel.forJunctionSeq.txt.gz"),
     package="JctSeqData");
########################################
#Advanced Analysis:

#Make a "design" dataframe:
design <- data.frame(condition = factor(decoder$group.ID));
#Read the QoRTs counts.
jscs = readJunctionSeqCounts(countfiles = countFiles,
           samplenames = decoder$sample.ID,
           design = design,
           flat.gff.file = gff.file
);
#Generate the size factors and load them into the JunctionSeqCountSet:
jscs <- estimateJunctionSeqSizeFactors(jscs);
#Estimate feature-specific dispersions:
jscs <- estimateJunctionSeqDispersions(jscs);
#Fit dispersion function and estimate MAP dispersion:
jscs <- fitJunctionSeqDispersionFunction(jscs);
#Test for differential usage:
jscs <- testForDiffUsage(jscs);
#Estimate effect sizes and expression estimates:
jscs <- estimateEffectSizes( jscs);


## End(Not run)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(JunctionSeq)
Loading required package: SummarizedExperiment
Loading required package: GenomicRanges
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/JunctionSeq/estimateJunctionSeqSizeFactors.Rd_%03d_medium.png", width=480, height=480)
> ### Name: estimateJunctionSeqSizeFactors
> ### Title: Estimate Size Factors
> ### Aliases: estimateJunctionSeqSizeFactors writeSizeFactors
> 
> ### ** Examples
> 
> 
> data(exampleDataSet,package="JctSeqData");
> jscs <- estimateJunctionSeqSizeFactors(jscs);
> 
> ## Not run: 
> ##D ########################################
> ##D #Set up example data:
> ##D decoder.file <- system.file(
> ##D                   "extdata/annoFiles/decoder.bySample.txt",
> ##D                   package="JctSeqData");
> ##D decoder <- read.table(decoder.file,
> ##D                   header=TRUE,
> ##D                   stringsAsFactors=FALSE);
> ##D gff.file <- system.file(
> ##D             "extdata/cts/withNovel.forJunctionSeq.gff.gz",
> ##D             package="JctSeqData");
> ##D countFiles <- system.file(paste0("extdata/cts/",
> ##D      decoder$sample.ID,
> ##D      "/QC.spliceJunctionAndExonCounts.withNovel.forJunctionSeq.txt.gz"),
> ##D      package="JctSeqData");
> ##D ########################################
> ##D #Advanced Analysis:
> ##D 
> ##D #Make a "design" dataframe:
> ##D design <- data.frame(condition = factor(decoder$group.ID));
> ##D #Read the QoRTs counts.
> ##D jscs = readJunctionSeqCounts(countfiles = countFiles,
> ##D            samplenames = decoder$sample.ID,
> ##D            design = design,
> ##D            flat.gff.file = gff.file
> ##D );
> ##D #Generate the size factors and load them into the JunctionSeqCountSet:
> ##D jscs <- estimateJunctionSeqSizeFactors(jscs);
> ##D #Estimate feature-specific dispersions:
> ##D jscs <- estimateJunctionSeqDispersions(jscs);
> ##D #Fit dispersion function and estimate MAP dispersion:
> ##D jscs <- fitJunctionSeqDispersionFunction(jscs);
> ##D #Test for differential usage:
> ##D jscs <- testForDiffUsage(jscs);
> ##D #Estimate effect sizes and expression estimates:
> ##D jscs <- estimateEffectSizes( jscs);
> ##D 
> ## End(Not run)
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>