Last data update: 2014.03.03

R: Pathway enrichment analysis
pathEnrichR Documentation

Pathway enrichment analysis

Description

Function performs a pathway enrichment analysis of a definied set of genes.

Usage

pathEnrich(geneList, geneSets, universe=NULL)

Arguments

geneList

vector of gene names to be used for pathway enrichment

geneSets

"GeneSetColletion" object with functional pathways gene sets

universe

number of genes that were probed in the initial experiment

Details

geneSets is a "GeneSetColletion" object containing gene sets from various databases. Different sources for gene sets data are allowed and have to be provided in Gene Matrix Transposed file format (*.gmt), where each gene set is described by a pathway name, a description, and the genes in the gene set. Two examples are shown to demonstrate how to define geneSets object. See examples.

The variable universe represents a total number of genes that were probed in the initial experiment, e.g. the number of all genes on a microarray. If universe is not definied, universe is equal to the number of all genes that can be mapped to any pathways in chosen database.

Value

A data.frame with following columns:

pathway

names of enriched pathways

description

gene set description (e.g. a link to the named gene set in MSigDB)

genes_in_pathway

total number of known genes in the pathway

%_match

number of matched genes refered to the total number of known genes in the pathway given in %

pValue

p-value

adj.pValue

Benjamini-Hochberg adjucted p-value

overlap

genes from input genes list that overlap with all known genes in the pathway

Additionally an .txt file containing all above information is created.

Author(s)

Agata Michna

References

Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A., Paulovich, A., Pomeroy, S. L., Golub, T. R., Lander, E. S. and Mesirov, J. P. (2005). Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. PNAS 102(43), 15545-15550.

http://www.broadinstitute.org/gsea/msigdb/collections.jsp

http://www.reactome.org/pages/download-data/

Examples

## Not run: 
   ## Example 1 - using gene sets from the Molecular Signatures Database (MSigDB collections)
   ## Download .gmt file 'c2.all.v5.0.symbols.gmt' (all curated gene sets, gene symbols)
   ## from the Broad, http://www.broad.mit.edu/gsea/downloads.jsp#msigdb, then
   geneSets <- getGmt("/path/to/c2.all.v5.0.symbols.gmt")
   ## load "eSetObject" containing simulated time-course data
   data(TCsimData)
   ## check for differentially expressed genes
   diffExprs <- splineDiffExprs(eSetObject = TCsimData, df = 3, cutoff.adj.pVal = 0.01, reference = "T1")
   ## use differentially expressed genes for pathway enrichment analysis
   enrichPath <- pathEnrich(geneList = rownames(diffExprs), geneSets = geneSets, universe = 6536)
## End(Not run)

## Not run: 
   ## Example 2 - using gene sets from the Reactome Pathway Database
   ## Download and unzip .gmt.zip file 'ReactomePathways.gmt.zip'
   ## ("Reactome Pathways Gene Set" under "Specialized data formats") from the Reactome website
   ## http://www.reactome.org/pages/download-data/, then
   geneSets <- getGmt("/path/to/ReactomePathways.gmt")
   data(TCsimData)
   diffExprs <- splineDiffExprs(eSetObject = TCsimData, df = 3, cutoff.adj.pVal = 0.01, reference = "T1")
   enrichPath <- pathEnrich(geneList = rownames(diffExprs), geneSets = geneSets, universe = 6536)
## End(Not run)
   
## Small example with gene sets consist of KEGG pathways only
geneSets <- getGmt(system.file("extdata", "c2.cp.kegg.v5.0.symbols.gmt", package="splineTimeR"))
data(TCsimData)
diffExprs <- splineDiffExprs(eSetObject = TCsimData, df = 3, cutoff.adj.pVal = 0.01, reference = "T1")
enrichPath <- pathEnrich(geneList = rownames(diffExprs), geneSets = geneSets, universe = 6536)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(splineTimeR)
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: igraph

Attaching package: 'igraph'

The following objects are masked from 'package:BiocGenerics':

    normalize, union

The following objects are masked from 'package:stats':

    decompose, spectrum

The following object is masked from 'package:base':

    union

Loading required package: limma

Attaching package: 'limma'

The following object is masked from 'package:BiocGenerics':

    plotMA

Loading required package: GSEABase
Loading required package: annotate
Loading required package: AnnotationDbi
Loading required package: stats4
Loading required package: IRanges
Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following object is masked from 'package:igraph':

    compare

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums


Attaching package: 'IRanges'

The following object is masked from 'package:igraph':

    simplify

Loading required package: XML
Loading required package: graph

Attaching package: 'graph'

The following object is masked from 'package:XML':

    addNode

The following objects are masked from 'package:igraph':

    degree, edges, intersection

Loading required package: gtools

Attaching package: 'gtools'

The following object is masked from 'package:igraph':

    permute

Loading required package: splines
Loading required package: GeneNet
Loading required package: corpcor
Loading required package: longitudinal
Loading required package: fdrtool
Loading required package: FIs
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/splineTimeR/pathEnrich.Rd_%03d_medium.png", width=480, height=480)
> ### Name: pathEnrich
> ### Title: Pathway enrichment analysis
> ### Aliases: pathEnrich
> ### Keywords: gene set enrichment analysis pathway enrichment analysis
> 
> ### ** Examples
> 
> ## Not run: 
> ##D    ## Example 1 - using gene sets from the Molecular Signatures Database (MSigDB collections)
> ##D    ## Download .gmt file 'c2.all.v5.0.symbols.gmt' (all curated gene sets, gene symbols)
> ##D    ## from the Broad, http://www.broad.mit.edu/gsea/downloads.jsp#msigdb, then
> ##D    geneSets <- getGmt("/path/to/c2.all.v5.0.symbols.gmt")
> ##D    ## load "eSetObject" containing simulated time-course data
> ##D    data(TCsimData)
> ##D    ## check for differentially expressed genes
> ##D    diffExprs <- splineDiffExprs(eSetObject = TCsimData, df = 3, cutoff.adj.pVal = 0.01, reference = "T1")
> ##D    ## use differentially expressed genes for pathway enrichment analysis
> ##D    enrichPath <- pathEnrich(geneList = rownames(diffExprs), geneSets = geneSets, universe = 6536)
> ## End(Not run)
> 
> ## Not run: 
> ##D    ## Example 2 - using gene sets from the Reactome Pathway Database
> ##D    ## Download and unzip .gmt.zip file 'ReactomePathways.gmt.zip'
> ##D    ## ("Reactome Pathways Gene Set" under "Specialized data formats") from the Reactome website
> ##D    ## http://www.reactome.org/pages/download-data/, then
> ##D    geneSets <- getGmt("/path/to/ReactomePathways.gmt")
> ##D    data(TCsimData)
> ##D    diffExprs <- splineDiffExprs(eSetObject = TCsimData, df = 3, cutoff.adj.pVal = 0.01, reference = "T1")
> ##D    enrichPath <- pathEnrich(geneList = rownames(diffExprs), geneSets = geneSets, universe = 6536)
> ## End(Not run)
>    
> ## Small example with gene sets consist of KEGG pathways only
> geneSets <- getGmt(system.file("extdata", "c2.cp.kegg.v5.0.symbols.gmt", package="splineTimeR"))
> data(TCsimData)
> diffExprs <- splineDiffExprs(eSetObject = TCsimData, df = 3, cutoff.adj.pVal = 0.01, reference = "T1")
------------------------------------------------- 
Differential analysis done for df = 3 and adj.P.Val <= 0.01 
Number of differentially expressed genes:  952 
> enrichPath <- pathEnrich(geneList = rownames(diffExprs), geneSets = geneSets, universe = 6536)
-------------------------------------------------------- 
Pathway enrichment done! 
-------------------------------------------------------- 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>