R: Ensemble of Gene Set Enrichment Analyses Function
egsea.cnt
R Documentation
Ensemble of Gene Set Enrichment Analyses Function
Description
This is the main function to carry out gene set enrichment
analysis using the
EGSEA algorithm. This function is aimed to use the raw count matrix to
perform the EGSEA analysis.
double, numeric matrix of read counts where genes are the rows
and samples are
the columns.
group
character, vector or factor giving the experimental
group/condition for each sample/library
design
double, numeric matrix giving the design matrix of the linear
model fitting.
contrasts
double, an N x L matrix indicates the contrast of the
linear model coefficients for
which the test is required. N is number of experimental conditions and L is
number of contrasts.
logFC
double, an K x L matrix indicates the log2 fold change of each
gene for each contrast.
K is the number of genes included in the analysis. If logFC=NULL, the logFC
values are
estimated using the eBayes for each contrast.
gs.annots
list, indexed collections of gene sets. It is generated
using one of these functions:
buildIdxEZID, buildMSigDBIdxEZID,
buildKEGGIdxEZID,
buildGeneSetDBIdxEZID, and buildCustomIdxEZID.
symbolsMap
dataframe, an K x 2 matrix stores the gene symbol of each
Entrez Gene ID. It
is used for the heatmap visualization. The order of rows should match that
of the
counts. Default symbolsMap=NULL.
baseGSEAs
character, a vector of the gene set tests that should be
included in the
ensemble. Type egsea.base to see the supported GSE methods.
By default, all
supported methods are used.
minSize
integer, the minimum size of a gene set to be included in the
analysis.
Default minSize= 2.
display.top
integer, the number of top gene sets to be displayed in
the EGSEA report.
You can always access the list of all tested gene sets using the returned
gsa list.
Default is 20.
combineMethod
character, determines how to combine p-values from
different
GSEA method. Type egsea.combine() to see supported methods.
combineWeights
double, a vector determines how different GSEA methods
will be weighted.
Its values should range between 0 and 1. This option is not supported
currently.
sort.by
character, determines how to order the analysis results in
the stats table. Type
egsea.sort() to see all available options.
egsea.dir
character, directory into which the analysis results are
written out.
kegg.dir
character, the directory of KEGG pathway data file (.xml)
and image file (.png).
Default kegg.dir=paste0(egsea.dir, "/kegg-dir/").
logFC.cutoff
numeric, cut-off threshold of logFC and is used for
Sginificance Score
and Regulation Direction Calculations. Default logFC.cutoff=0.
sum.plot.axis
character, the x-axis of the summary plot. All the
values accepted by the
sort.by parameter can be used. Default sum.plot.axis="p.value".
sum.plot.cutoff
numeric, cut-off threshold to filter the gene sets of
the summary plots
based on the values of the sum.plot.axis. Default
sum.plot.cutoff=NULL.
vote.bin.width
numeric, the bin width of the vote ranking. Default
vote.bin.width=5.
num.threads
numeric, number of CPU threads to be used. Default
num.threads=2.
report
logical, whether to generate the EGSEA interactive report. It
takes longer time
to run. Default is True.
print.base
logical, whether to write out the results of the
individual GSE methods.
Default FALSE.
verbose
logical, whether to print out progress messages and warnings.
Details
EGSEA, an acronym for Ensemble of Gene Set Enrichment
Analyses, utilizes the
analysis results of eleven prominent GSE algorithms from the literature to
calculate
collective significance scores for gene sets. These methods include:
ora,
globaltest, plage, safe, zscore, gage,
ssgsea,
roast, padog, camera and gsva.
The ora, gage, camera and gsva methods depend on a competitive null
hypothesis while the
remaining seven methods are based on a self-contained hypothesis.
Conveniently, the
algorithm proposed here is not limited to these eleven GSE methods and new
GSE tests
can be easily integrated into the framework. This function takes the raw
count matrix,
the experimental group of each sample, the design matrix and the contrast
matrix as parameters.
It performs TMM normalization and then applies voom to
calculate the logCPM and weighting factors.
Value
A list of elements, each with two/three elements that store the top
gene sets and the detailed analysis
results for each contrast and the comparative analysis results.
References
Monther Alhamdoosh, Milica Ng, Nicholas J. Wilson, Julie M. Sheridan, Huy
Huynh, Michael J. Wilson
and Matthew E. Ritchie. Combining multiple tools outperforms individual
methods in gene set enrichment
analyses.
See Also
egsea.base, egsea.sort,
buildIdxEZID, buildMSigDBIdxEZID,
buildKEGGIdxEZID,
buildGeneSetDBIdxEZID, and buildCustomIdxEZID
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(EGSEA)
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: gage
Loading required package: AnnotationDbi
Loading required package: stats4
Loading required package: IRanges
Loading required package: S4Vectors
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: topGO
Loading required package: graph
Loading required package: GO.db
Loading required package: SparseM
Attaching package: 'SparseM'
The following object is masked from 'package:base':
backsolve
groupGOTerms: GOBPTerm, GOMFTerm, GOCCTerm environments built.
Attaching package: 'topGO'
The following object is masked from 'package:IRanges':
members
The following object is masked from 'package:gage':
geneData
Loading required package: pathview
Loading required package: org.Hs.eg.db
##############################################################################
Pathview is an open source software package distributed under GNU General
Public License version 3 (GPLv3). Details of GPLv3 is available at
http://www.gnu.org/licenses/gpl-3.0.html. Particullary, users are required to
formally cite the original Pathview paper (not just mention it) in publications
or products. For details, do citation("pathview") within R.
The pathview downloads and uses KEGG data. Non-academic uses may require a KEGG
license agreement (details at http://www.kegg.jp/kegg/legal.html).
##############################################################################
KEGG.db contains mappings based on older data because the original
resource was removed from the the public domain before the most
recent update was produced. This package should now be considered
deprecated and future versions of Bioconductor may not have it
available. Users who want more current data are encouraged to look
at the KEGGREST or reactome.db packages
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/EGSEA/egsea.cnt.Rd_%03d_medium.png", width=480, height=480)
> ### Name: egsea.cnt
> ### Title: Ensemble of Gene Set Enrichment Analyses Function
> ### Aliases: egsea.cnt
>
> ### ** Examples
>
> library(EGSEAdata)
> data(il13.data.cnt)
> cnt = il13.data.cnt$counts
> group = il13.data.cnt$group
> design = il13.data.cnt$design
> contrasts = il13.data.cnt$contra
> genes = il13.data.cnt$genes
> gs.annots = buildIdxEZID(entrezIDs=rownames(cnt), species="human",
+ msigdb.gsets="none",
+ kegg.updated=FALSE, kegg.exclude = c("Metabolism"))
[1] "Building KEGG pathways annotation object ... "
> # set report = TRUE to generate the EGSEA interactive report
> gsa = egsea.cnt(counts=cnt, group=group, design=design, contrasts=contrasts,
+ gs.annots=gs.annots,
+ symbolsMap=genes, baseGSEAs=egsea.base()[-c(2,5,6,9)],
+ display.top = 5,
+ sort.by="avg.rank",
+ egsea.dir="./il13-egsea-cnt-report",
+ num.threads = 2, report = FALSE)
[1] "Log fold changes are estimated using limma package ... "
[1] "EGSEA is running on the provided data and kegg gene sets"
[1] " Running CAMERA for X24IL13 - X24"
203 categories formed
[1] " Running SAFE for X24IL13 - X24"
Warning: only 20 unique resamples exist
switching to exhaustive permutation
[1] " Running SAFE for X24IL13Ant - X24IL13"
Warning: only 20 unique resamples exist
switching to exhaustive permutation
[1] "Running SAFE on all \ncontrasts ... COMPLETED "
[1] " Running ZSCORE for X24IL13 - X24"
[1] " Running CAMERA for X24IL13Ant - X24IL13"
[1] "Running CAMERA on all \ncontrasts ... COMPLETED "
[1] " Running GAGE for X24IL13 - X24"
[1] " Running ZSCORE for X24IL13Ant - X24IL13"
[1] " Running GAGE for X24IL13Ant - X24IL13"
[1] "Running GAGE on all \ncontrasts ... COMPLETED "
[1] " Running GSVA for X24IL13 - X24"
[1] "Running ZSCORE on all \ncontrasts ... COMPLETED "
[1] " Running GLOBALTEST for X24IL13 - X24"
[1] " Running GLOBALTEST for X24IL13Ant - X24IL13"
[1] "Running GLOBALTEST on all \ncontrasts ... COMPLETED "
[1] " Running GSVA for X24IL13Ant - X24IL13"
[1] "Running GSVA on all \ncontrasts ... COMPLETED "
[1] " Running ORA for X24IL13 - X24"
[1] " Running ORA for X24IL13Ant - X24IL13"
[1] "Running ORA on all \ncontrasts ... COMPLETED "
[1] "Writing out the top-ranked gene sets for each contrast .. \nKEGG gene sets"
[1] "The top gene sets for contrast X24IL13 - X24 are:"
Type p.adj
Intestinal immune network for IgA production Signaling 4.879961e-08
Malaria Disease 9.575632e-10
Asthma Disease 3.980634e-10
Amoebiasis Disease 6.760249e-07
Viral myocarditis Disease 1.066762e-06
[1] "The top gene sets for contrast X24IL13Ant - X24IL13 are:"
Type p.adj
Malaria Disease 1.126876e-14
Cytokine-cytokine receptor interaction Signaling 0.000000e+00
Legionellosis Disease 7.445042e-09
Rheumatoid arthritis Disease 5.393982e-12
Toll-like receptor signaling pathway Signaling 5.890006e-08
>
>
>
>
>
>
>
> dev.off()
null device
1
>