Last data update: 2014.03.03

R: Estimate Effect Sizes, parameter estimates, etc.
estimateEffectSizesR Documentation

Estimate Effect Sizes, parameter estimates, etc.

Description

This function runs fits another generalized linear model to the data, this one intended for use in estimating the effect sizes and expression estimates for each analysis.

This function is called internally by the runJunctionSeqAnalyses function, and thus for most purposes users should not need to call this function directly. It may be useful to advanced users performing non-standard analyses.

Usage

estimateEffectSizes(jscs, 
   method.expressionEstimation = c("feature-vs-gene",
                                   "feature-vs-otherFeatures"),
   effect.formula = formula(~ condition + countbin + condition : countbin),
   geneLevel.formula = formula(~ condition),
   calculate.geneLevel.expression = TRUE,
   keep.estimation.fit = FALSE,
   nCores=1, 
   dispColumn="dispersion",
   verbose = TRUE)

Arguments

jscs

A JunctionSeqCountSet. Usually initially created by readJunctionSeqCounts. Size factors must be set, usually using functions estimateSizeFactors and estimateJunctionSeqDispersions.

method.expressionEstimation

Character string. Can be used to apply alternative methodologies or implementations. Intended for advanced users who have strong opinions about the underlying statistical methodologies.

Determines the methodology used to generate feature expression estimates and relative fold changes. By default each feature is modeled separately. Under the default count-vector method, this means that the resultant relative fold changes will be a measure of the relative fold change between the feature and the gene as a whole.

Alternatively, the "feature-vs-otherFeatures" method builds a large, complex model containing all features belonging to the gene. The coefficients for each feature are then "balanced" using linear contrasts weighted by the inverse of their variance. In general we have found this method to produce very similar results but less efficiently and less consistently. Additionally, this alternative method "multi-counts" reads that cover more than one feature. This can result in over-weighting of exonic regions with a large number of annotated variations in a small genomic area, as each individual read or read-pair may be counted many times in the model.

Under the default option, no read or read-pair is ever counted more than once in a given model.

effect.formula

For advanced users. The base formula for the model used for effect size estimation.

NOTE: the biological condition to be tested must be named "condition".

geneLevel.formula

For advanced users. The base formula for the model used to estimate total gene-level expression.

NOTE: the biological condition to be tested must be named "condition".

calculate.geneLevel.expression

Logical value. If TRUE, gene-level expression will be estimated using the same maximum-likelihood method used in other analyses. Default: TRUE.

keep.estimation.fit

Logical value. If TRUE, save the complete model fits for every gene. This will require a lot of memory, but may be useful for statistical diagnostics. Default: FALSE.

nCores

Either an integer or a BiocParallelParam object. Either way, this determines The number of cores to use. Note that multicore functionality may not be available on all platforms. If parallel execution is not available then JunctionSeq will automatically fallback to single-core execution. See the BiocParallel package for more information.

dispColumn

Character value. The name of the fData(jscs) column in which the model dispersion is stored.

verbose

if TRUE, send debugging and progress messages to the console / stdout.

Value

A JunctionSeqCountSet, with effect size results included.

Examples

data(exampleDataSet,package="JctSeqData");
jscs <- estimateEffectSizes(jscs);

## Not run: 
#Full example (from scratch):

########################################
#Set up example data:
decoder.file <- system.file(
                  "extdata/annoFiles/decoder.bySample.txt",
                  package="JctSeqData");
decoder <- read.table(decoder.file,
                  header=TRUE,
                  stringsAsFactors=FALSE);
gff.file <- system.file(
            "extdata/cts/withNovel.forJunctionSeq.gff.gz",
            package="JctSeqData");
countFiles <- system.file(paste0("extdata/cts/",
     decoder$sample.ID,
     "/QC.spliceJunctionAndExonCounts.withNovel.forJunctionSeq.txt.gz"),
     package="JctSeqData");
########################################
#Advanced Analysis:

#Make a "design" dataframe:
design <- data.frame(condition = factor(decoder$group.ID));
#Read the QoRTs counts.
jscs = readJunctionSeqCounts(countfiles = countFiles,
           samplenames = decoder$sample.ID,
           design = design,
           flat.gff.file = gff.file
);
#Generate the size factors and load them into the JunctionSeqCountSet:
jscs <- estimateJunctionSeqSizeFactors(jscs);
#Estimate feature-specific dispersions:
jscs <- estimateJunctionSeqDispersions(jscs);
#Fit dispersion function and estimate MAP dispersion:
jscs <- fitJunctionSeqDispersionFunction(jscs);
#Test for differential usage:
jscs <- testForDiffUsage(jscs);
#Estimate effect sizes and expression estimates:
jscs <- estimateEffectSizes( jscs);


## End(Not run)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(JunctionSeq)
Loading required package: SummarizedExperiment
Loading required package: GenomicRanges
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/JunctionSeq/estimateEffectSizes.Rd_%03d_medium.png", width=480, height=480)
> ### Name: estimateEffectSizes
> ### Title: Estimate Effect Sizes, parameter estimates, etc.
> ### Aliases: estimateEffectSizes
> 
> ### ** Examples
> 
> data(exampleDataSet,package="JctSeqData");
> jscs <- estimateEffectSizes(jscs);
> Using single-core execution.
-------> estimateEffectSizes: (Calculating effect size and predicted values for feature 1000 of 1516)(Tue Jul  5 23:17:42 2016)
-------> estimateEffectSizes: Estimating gene-level expression.
-------> estimateEffectSizes: (Calculating gene-level effect size and predicted values for gene 100 of 120)(Tue Jul  5 23:17:43 2016)
-------> estimateEffectSizes: Starting gene-wise p-adjust. (Tue Jul  5 23:17:43 2016)
-------> estimateEffectSizes: Finished gene-wise p-adjust. (Tue Jul  5 23:17:43 2016)
> 
> ## Not run: 
> ##D #Full example (from scratch):
> ##D 
> ##D ########################################
> ##D #Set up example data:
> ##D decoder.file <- system.file(
> ##D                   "extdata/annoFiles/decoder.bySample.txt",
> ##D                   package="JctSeqData");
> ##D decoder <- read.table(decoder.file,
> ##D                   header=TRUE,
> ##D                   stringsAsFactors=FALSE);
> ##D gff.file <- system.file(
> ##D             "extdata/cts/withNovel.forJunctionSeq.gff.gz",
> ##D             package="JctSeqData");
> ##D countFiles <- system.file(paste0("extdata/cts/",
> ##D      decoder$sample.ID,
> ##D      "/QC.spliceJunctionAndExonCounts.withNovel.forJunctionSeq.txt.gz"),
> ##D      package="JctSeqData");
> ##D ########################################
> ##D #Advanced Analysis:
> ##D 
> ##D #Make a "design" dataframe:
> ##D design <- data.frame(condition = factor(decoder$group.ID));
> ##D #Read the QoRTs counts.
> ##D jscs = readJunctionSeqCounts(countfiles = countFiles,
> ##D            samplenames = decoder$sample.ID,
> ##D            design = design,
> ##D            flat.gff.file = gff.file
> ##D );
> ##D #Generate the size factors and load them into the JunctionSeqCountSet:
> ##D jscs <- estimateJunctionSeqSizeFactors(jscs);
> ##D #Estimate feature-specific dispersions:
> ##D jscs <- estimateJunctionSeqDispersions(jscs);
> ##D #Fit dispersion function and estimate MAP dispersion:
> ##D jscs <- fitJunctionSeqDispersionFunction(jscs);
> ##D #Test for differential usage:
> ##D jscs <- testForDiffUsage(jscs);
> ##D #Estimate effect sizes and expression estimates:
> ##D jscs <- estimateEffectSizes( jscs);
> ##D 
> ## End(Not run)
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>