Last data update: 2014.03.03

R: Ensemble of Gene Set Enrichment Analyses Function
egsea.cntR Documentation

Ensemble of Gene Set Enrichment Analyses Function

Description

This is the main function to carry out gene set enrichment analysis using the EGSEA algorithm. This function is aimed to use the raw count matrix to perform the EGSEA analysis.

Usage

egsea.cnt(counts, group, design = NULL, contrasts, logFC = NULL, gs.annots,
  symbolsMap = NULL, baseGSEAs = egsea.base(), minSize = 2,
  display.top = 20, combineMethod = "fisher", combineWeights = NULL,
  sort.by = "p.adj", egsea.dir = "./", kegg.dir = NULL,
  logFC.cutoff = 0, sum.plot.axis = "p.adj", sum.plot.cutoff = NULL,
  vote.bin.width = 5, num.threads = 4, report = TRUE,
  print.base = FALSE, verbose = FALSE)

Arguments

counts

double, numeric matrix of read counts where genes are the rows and samples are the columns.

group

character, vector or factor giving the experimental group/condition for each sample/library

design

double, numeric matrix giving the design matrix of the linear model fitting.

contrasts

double, an N x L matrix indicates the contrast of the linear model coefficients for which the test is required. N is number of experimental conditions and L is number of contrasts.

logFC

double, an K x L matrix indicates the log2 fold change of each gene for each contrast. K is the number of genes included in the analysis. If logFC=NULL, the logFC values are estimated using the eBayes for each contrast.

gs.annots

list, indexed collections of gene sets. It is generated using one of these functions: buildIdxEZID, buildMSigDBIdxEZID, buildKEGGIdxEZID, buildGeneSetDBIdxEZID, and buildCustomIdxEZID.

symbolsMap

dataframe, an K x 2 matrix stores the gene symbol of each Entrez Gene ID. It is used for the heatmap visualization. The order of rows should match that of the counts. Default symbolsMap=NULL.

baseGSEAs

character, a vector of the gene set tests that should be included in the ensemble. Type egsea.base to see the supported GSE methods. By default, all supported methods are used.

minSize

integer, the minimum size of a gene set to be included in the analysis. Default minSize= 2.

display.top

integer, the number of top gene sets to be displayed in the EGSEA report. You can always access the list of all tested gene sets using the returned gsa list. Default is 20.

combineMethod

character, determines how to combine p-values from different GSEA method. Type egsea.combine() to see supported methods.

combineWeights

double, a vector determines how different GSEA methods will be weighted. Its values should range between 0 and 1. This option is not supported currently.

sort.by

character, determines how to order the analysis results in the stats table. Type egsea.sort() to see all available options.

egsea.dir

character, directory into which the analysis results are written out.

kegg.dir

character, the directory of KEGG pathway data file (.xml) and image file (.png). Default kegg.dir=paste0(egsea.dir, "/kegg-dir/").

logFC.cutoff

numeric, cut-off threshold of logFC and is used for Sginificance Score and Regulation Direction Calculations. Default logFC.cutoff=0.

sum.plot.axis

character, the x-axis of the summary plot. All the values accepted by the sort.by parameter can be used. Default sum.plot.axis="p.value".

sum.plot.cutoff

numeric, cut-off threshold to filter the gene sets of the summary plots based on the values of the sum.plot.axis. Default sum.plot.cutoff=NULL.

vote.bin.width

numeric, the bin width of the vote ranking. Default vote.bin.width=5.

num.threads

numeric, number of CPU threads to be used. Default num.threads=2.

report

logical, whether to generate the EGSEA interactive report. It takes longer time to run. Default is True.

print.base

logical, whether to write out the results of the individual GSE methods. Default FALSE.

verbose

logical, whether to print out progress messages and warnings.

Details

EGSEA, an acronym for Ensemble of Gene Set Enrichment Analyses, utilizes the analysis results of eleven prominent GSE algorithms from the literature to calculate collective significance scores for gene sets. These methods include: ora, globaltest, plage, safe, zscore, gage, ssgsea, roast, padog, camera and gsva. The ora, gage, camera and gsva methods depend on a competitive null hypothesis while the remaining seven methods are based on a self-contained hypothesis. Conveniently, the algorithm proposed here is not limited to these eleven GSE methods and new GSE tests can be easily integrated into the framework. This function takes the raw count matrix, the experimental group of each sample, the design matrix and the contrast matrix as parameters. It performs TMM normalization and then applies voom to calculate the logCPM and weighting factors.

Value

A list of elements, each with two/three elements that store the top gene sets and the detailed analysis results for each contrast and the comparative analysis results.

References

Monther Alhamdoosh, Milica Ng, Nicholas J. Wilson, Julie M. Sheridan, Huy Huynh, Michael J. Wilson and Matthew E. Ritchie. Combining multiple tools outperforms individual methods in gene set enrichment analyses.

See Also

egsea.base, egsea.sort, buildIdxEZID, buildMSigDBIdxEZID, buildKEGGIdxEZID, buildGeneSetDBIdxEZID, and buildCustomIdxEZID

Examples

library(EGSEAdata)
data(il13.data.cnt)
cnt = il13.data.cnt$counts
group = il13.data.cnt$group
design = il13.data.cnt$design
contrasts = il13.data.cnt$contra
genes = il13.data.cnt$genes
gs.annots = buildIdxEZID(entrezIDs=rownames(cnt), species="human", 
msigdb.gsets="none",
         kegg.updated=FALSE, kegg.exclude = c("Metabolism"))
# set report = TRUE to generate the EGSEA interactive report
gsa = egsea.cnt(counts=cnt, group=group, design=design, contrasts=contrasts, 
         gs.annots=gs.annots, 
         symbolsMap=genes, baseGSEAs=egsea.base()[-c(2,5,6,9)], 
display.top = 5,
          sort.by="avg.rank", 
egsea.dir="./il13-egsea-cnt-report", 
         num.threads = 2, report = FALSE)
 

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(EGSEA)
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: gage
Loading required package: AnnotationDbi
Loading required package: stats4
Loading required package: IRanges
Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: topGO
Loading required package: graph
Loading required package: GO.db

Loading required package: SparseM

Attaching package: 'SparseM'

The following object is masked from 'package:base':

    backsolve


groupGOTerms: 	GOBPTerm, GOMFTerm, GOCCTerm environments built.

Attaching package: 'topGO'

The following object is masked from 'package:IRanges':

    members

The following object is masked from 'package:gage':

    geneData

Loading required package: pathview
Loading required package: org.Hs.eg.db

##############################################################################
Pathview is an open source software package distributed under GNU General
Public License version 3 (GPLv3). Details of GPLv3 is available at
http://www.gnu.org/licenses/gpl-3.0.html. Particullary, users are required to
formally cite the original Pathview paper (not just mention it) in publications
or products. For details, do citation("pathview") within R.

The pathview downloads and uses KEGG data. Non-academic uses may require a KEGG
license agreement (details at http://www.kegg.jp/kegg/legal.html).
##############################################################################

KEGG.db contains mappings based on older data because the original
  resource was removed from the the public domain before the most
  recent update was produced. This package should now be considered
  deprecated and future versions of Bioconductor may not have it
  available.  Users who want more current data are encouraged to look
  at the KEGGREST or reactome.db packages





> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/EGSEA/egsea.cnt.Rd_%03d_medium.png", width=480, height=480)
> ### Name: egsea.cnt
> ### Title: Ensemble of Gene Set Enrichment Analyses Function
> ### Aliases: egsea.cnt
> 
> ### ** Examples
> 
> library(EGSEAdata)
> data(il13.data.cnt)
> cnt = il13.data.cnt$counts
> group = il13.data.cnt$group
> design = il13.data.cnt$design
> contrasts = il13.data.cnt$contra
> genes = il13.data.cnt$genes
> gs.annots = buildIdxEZID(entrezIDs=rownames(cnt), species="human", 
+ msigdb.gsets="none",
+          kegg.updated=FALSE, kegg.exclude = c("Metabolism"))
[1] "Building KEGG pathways annotation object ... "
> # set report = TRUE to generate the EGSEA interactive report
> gsa = egsea.cnt(counts=cnt, group=group, design=design, contrasts=contrasts, 
+          gs.annots=gs.annots, 
+          symbolsMap=genes, baseGSEAs=egsea.base()[-c(2,5,6,9)], 
+ display.top = 5,
+           sort.by="avg.rank", 
+ egsea.dir="./il13-egsea-cnt-report", 
+          num.threads = 2, report = FALSE)
[1] "Log fold changes are estimated using limma package ... "
[1] "EGSEA is running on the provided data and kegg gene sets"
[1] "   Running CAMERA for X24IL13 - X24"
203 categories formed
[1] "   Running SAFE for X24IL13 - X24"
Warning: only 20 unique resamples exist
          switching to exhaustive permutation
[1] "   Running SAFE for X24IL13Ant - X24IL13"
Warning: only 20 unique resamples exist
          switching to exhaustive permutation
[1] "Running SAFE on all \ncontrasts ... COMPLETED "
[1] "   Running ZSCORE for X24IL13 - X24"
[1] "   Running CAMERA for X24IL13Ant - X24IL13"
[1] "Running CAMERA on all \ncontrasts ... COMPLETED "
[1] "   Running GAGE for X24IL13 - X24"
[1] "   Running ZSCORE for X24IL13Ant - X24IL13"
[1] "   Running GAGE for X24IL13Ant - X24IL13"
[1] "Running GAGE on all \ncontrasts ... COMPLETED "
[1] "   Running GSVA for X24IL13 - X24"
[1] "Running ZSCORE on all \ncontrasts ... COMPLETED "
[1] "   Running GLOBALTEST for X24IL13 - X24"
[1] "   Running GLOBALTEST for X24IL13Ant - X24IL13"
[1] "Running GLOBALTEST on all \ncontrasts ... COMPLETED "
[1] "   Running GSVA for X24IL13Ant - X24IL13"
[1] "Running GSVA on all \ncontrasts ... COMPLETED "
[1] "   Running ORA for X24IL13 - X24"
[1] "   Running ORA for X24IL13Ant - X24IL13"
[1] "Running ORA on all \ncontrasts ... COMPLETED "
[1] "Writing out the top-ranked gene sets for each contrast .. \nKEGG gene sets"
[1] "The top gene sets for contrast X24IL13 - X24 are:"
                                                  Type        p.adj
Intestinal immune network for IgA production Signaling 4.879961e-08
Malaria                                        Disease 9.575632e-10
Asthma                                         Disease 3.980634e-10
Amoebiasis                                     Disease 6.760249e-07
Viral myocarditis                              Disease 1.066762e-06
[1] "The top gene sets for contrast X24IL13Ant - X24IL13 are:"
                                            Type        p.adj
Malaria                                  Disease 1.126876e-14
Cytokine-cytokine receptor interaction Signaling 0.000000e+00
Legionellosis                            Disease 7.445042e-09
Rheumatoid arthritis                     Disease 5.393982e-12
Toll-like receptor signaling pathway   Signaling 5.890006e-08
>  
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>