Last data update: 2014.03.03

R: Ensemble of Gene Set Enrichment Analyses Function
egsea.cntR Documentation

Ensemble of Gene Set Enrichment Analyses Function


This is the main function to carry out gene set enrichment analysis using the EGSEA algorithm. This function is aimed to use the raw count matrix to perform the EGSEA analysis.


egsea.cnt(counts, group, design = NULL, contrasts, logFC = NULL, gs.annots,
  symbolsMap = NULL, baseGSEAs = egsea.base(), minSize = 2, = 20, combineMethod = "fisher", combineWeights = NULL, = "p.adj", egsea.dir = "./", kegg.dir = NULL,
  logFC.cutoff = 0, sum.plot.axis = "p.adj", sum.plot.cutoff = NULL,
  vote.bin.width = 5, num.threads = 4, report = TRUE,
  print.base = FALSE, verbose = FALSE)



double, numeric matrix of read counts where genes are the rows and samples are the columns.


character, vector or factor giving the experimental group/condition for each sample/library


double, numeric matrix giving the design matrix of the linear model fitting.


double, an N x L matrix indicates the contrast of the linear model coefficients for which the test is required. N is number of experimental conditions and L is number of contrasts.


double, an K x L matrix indicates the log2 fold change of each gene for each contrast. K is the number of genes included in the analysis. If logFC=NULL, the logFC values are estimated using the eBayes for each contrast.


list, indexed collections of gene sets. It is generated using one of these functions: buildIdxEZID, buildMSigDBIdxEZID, buildKEGGIdxEZID, buildGeneSetDBIdxEZID, and buildCustomIdxEZID.


dataframe, an K x 2 matrix stores the gene symbol of each Entrez Gene ID. It is used for the heatmap visualization. The order of rows should match that of the counts. Default symbolsMap=NULL.


character, a vector of the gene set tests that should be included in the ensemble. Type egsea.base to see the supported GSE methods. By default, all supported methods are used.


integer, the minimum size of a gene set to be included in the analysis. Default minSize= 2.

integer, the number of top gene sets to be displayed in the EGSEA report. You can always access the list of all tested gene sets using the returned gsa list. Default is 20.


character, determines how to combine p-values from different GSEA method. Type egsea.combine() to see supported methods.


double, a vector determines how different GSEA methods will be weighted. Its values should range between 0 and 1. This option is not supported currently.

character, determines how to order the analysis results in the stats table. Type egsea.sort() to see all available options.


character, directory into which the analysis results are written out.


character, the directory of KEGG pathway data file (.xml) and image file (.png). Default kegg.dir=paste0(egsea.dir, "/kegg-dir/").


numeric, cut-off threshold of logFC and is used for Sginificance Score and Regulation Direction Calculations. Default logFC.cutoff=0.


character, the x-axis of the summary plot. All the values accepted by the parameter can be used. Default sum.plot.axis="p.value".


numeric, cut-off threshold to filter the gene sets of the summary plots based on the values of the sum.plot.axis. Default sum.plot.cutoff=NULL.


numeric, the bin width of the vote ranking. Default vote.bin.width=5.


numeric, number of CPU threads to be used. Default num.threads=2.


logical, whether to generate the EGSEA interactive report. It takes longer time to run. Default is True.


logical, whether to write out the results of the individual GSE methods. Default FALSE.


logical, whether to print out progress messages and warnings.


EGSEA, an acronym for Ensemble of Gene Set Enrichment Analyses, utilizes the analysis results of eleven prominent GSE algorithms from the literature to calculate collective significance scores for gene sets. These methods include: ora, globaltest, plage, safe, zscore, gage, ssgsea, roast, padog, camera and gsva. The ora, gage, camera and gsva methods depend on a competitive null hypothesis while the remaining seven methods are based on a self-contained hypothesis. Conveniently, the algorithm proposed here is not limited to these eleven GSE methods and new GSE tests can be easily integrated into the framework. This function takes the raw count matrix, the experimental group of each sample, the design matrix and the contrast matrix as parameters. It performs TMM normalization and then applies voom to calculate the logCPM and weighting factors.


A list of elements, each with two/three elements that store the top gene sets and the detailed analysis results for each contrast and the comparative analysis results.


Monther Alhamdoosh, Milica Ng, Nicholas J. Wilson, Julie M. Sheridan, Huy Huynh, Michael J. Wilson and Matthew E. Ritchie. Combining multiple tools outperforms individual methods in gene set enrichment analyses.

See Also

egsea.base, egsea.sort, buildIdxEZID, buildMSigDBIdxEZID, buildKEGGIdxEZID, buildGeneSetDBIdxEZID, and buildCustomIdxEZID


cnt =$counts
group =$group
design =$design
contrasts =$contra
genes =$genes
gs.annots = buildIdxEZID(entrezIDs=rownames(cnt), species="human", 
         kegg.updated=FALSE, kegg.exclude = c("Metabolism"))
# set report = TRUE to generate the EGSEA interactive report
gsa = egsea.cnt(counts=cnt, group=group, design=design, contrasts=contrasts, 
         symbolsMap=genes, baseGSEAs=egsea.base()[-c(2,5,6,9)], = 5,
         num.threads = 2, report = FALSE)


> library(EGSEAdata)
> data(
> cnt =$counts
> group =$group
> design =$design
> contrasts =$contra
> genes =$genes
> gs.annots = buildIdxEZID(entrezIDs=rownames(cnt), species="human", 
+ msigdb.gsets="none",
+          kegg.updated=FALSE, kegg.exclude = c("Metabolism"))
[1] "Building KEGG pathways annotation object ... "
> # set report = TRUE to generate the EGSEA interactive report
> gsa = egsea.cnt(counts=cnt, group=group, design=design, contrasts=contrasts, 
+          gs.annots=gs.annots, 
+          symbolsMap=genes, baseGSEAs=egsea.base()[-c(2,5,6,9)], 
+ = 5,
+ "avg.rank", 
+ egsea.dir="./il13-egsea-cnt-report", 
+          num.threads = 2, report = FALSE)
[1] "Log fold changes are estimated using limma package ... "
[1] "EGSEA is running on the provided data and kegg gene sets"
[1] "   Running CAMERA for X24IL13 - X24"
203 categories formed
[1] "   Running SAFE for X24IL13 - X24"
Warning: only 20 unique resamples exist
          switching to exhaustive permutation
[1] "   Running SAFE for X24IL13Ant - X24IL13"
Warning: only 20 unique resamples exist
          switching to exhaustive permutation
[1] "Running SAFE on all \ncontrasts ... COMPLETED "
[1] "   Running ZSCORE for X24IL13 - X24"
[1] "   Running CAMERA for X24IL13Ant - X24IL13"
[1] "Running CAMERA on all \ncontrasts ... COMPLETED "
[1] "   Running GAGE for X24IL13 - X24"
[1] "   Running ZSCORE for X24IL13Ant - X24IL13"
[1] "   Running GAGE for X24IL13Ant - X24IL13"
[1] "Running GAGE on all \ncontrasts ... COMPLETED "
[1] "   Running GSVA for X24IL13 - X24"
[1] "Running ZSCORE on all \ncontrasts ... COMPLETED "
[1] "   Running GLOBALTEST for X24IL13 - X24"
[1] "   Running GLOBALTEST for X24IL13Ant - X24IL13"
[1] "Running GLOBALTEST on all \ncontrasts ... COMPLETED "
[1] "   Running GSVA for X24IL13Ant - X24IL13"
[1] "Running GSVA on all \ncontrasts ... COMPLETED "
[1] "   Running ORA for X24IL13 - X24"
[1] "   Running ORA for X24IL13Ant - X24IL13"
[1] "Running ORA on all \ncontrasts ... COMPLETED "
[1] "Writing out the top-ranked gene sets for each contrast .. \nKEGG gene sets"
[1] "The top gene sets for contrast X24IL13 - X24 are:"
                                                  Type        p.adj
Intestinal immune network for IgA production Signaling 4.879961e-08
Malaria                                        Disease 9.575632e-10
Asthma                                         Disease 3.980634e-10
Amoebiasis                                     Disease 6.760249e-07
Viral myocarditis                              Disease 1.066762e-06
[1] "The top gene sets for contrast X24IL13Ant - X24IL13 are:"
                                            Type        p.adj
Malaria                                  Disease 1.126876e-14
Cytokine-cytokine receptor interaction Signaling 0.000000e+00
Legionellosis                            Disease 7.445042e-09
Rheumatoid arthritis                     Disease 5.393982e-12
Toll-like receptor signaling pathway   Signaling 5.890006e-08
