Last data update: 2014.03.03

R: Gene Ontology enrichment analysis
compute.go.enrichmentR Documentation

Gene Ontology enrichment analysis

Description

Computes enrichment scores for Gene Ontology terms associated with genes in each topic.

Usage

compute.go.enrichment(lda.results, go.db, ontology.type = "BP",
  reformat.gene.names = FALSE, bonferroni.correct = TRUE,
  p.val.threshold = if (bonferroni.correct) 0.05 else 0.01,
  go.score.class = "weight01Score", dag.file.prefix = FALSE)

Arguments

lda.results

A fitted LDA model, as returned by compute.lda

go.db

String. Genome-wide annotation with GO mapping for the appropriate organism (e.g. org.Mm.eg.db or org.Hs.eg.db).

ontology.type

(optional). “BP” for Biological Process, “MF” for Molecular Function, and “CC” for Cellular Component.

reformat.gene.names

Boolean. If set to TRUE, converts all gene names to capitalised lowercase.

bonferroni.correct

Boolean. Unless set to FALSE, adjust statistical testing p-value threshold for multiple testing.

p.val.threshold

Numeric (optional). P-value significance threshold.

go.score.class

String (optional). Name of the scoring method to use for the Kolmogorov-Smirnov test (e.g. “weigth01Score” or “elimScore”). See topGO documentation for a complete list of scoring methods.

dag.file.prefix

String or FALSE. If not set to FALSE, plots individual subgraphs of significant terms for each topic using the string as filename prefix.

Value

Returns a named list object with ranked tables of significantly enriched GO terms for each topic (‘all’), terms that only appear in each topic (‘unique’) and terms that appear in less than half of the other topics (‘rare’). In addition the list object contains an igraph object with the full GO DAG, annotated with each term's p-value and the significance threshold adjusted for multiple testing (Bonferroni method).

Examples

# Load pre-computed LDA model for skeletal myoblast RNA-Seq data from HSMMSingleCell package:
data(HSMM_lda_model)


# Load GO mapping database for 'homo sapiens':
library(org.Hs.eg.db)
# Compute Cellular Component GO enrichment sets for each topic:
go.results = compute.go.enrichment(HSMM_lda_model, org.Hs.eg.db, ontology.type="CC", bonferroni.correct=TRUE, p.val.threshold=0.01)

# Print table of terms that are only significantly enriched in each topic: 
print(go.results$unique)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(cellTree)
Loading required package: topGO
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: graph
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: GO.db
Loading required package: AnnotationDbi
Loading required package: stats4
Loading required package: IRanges
Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums


Loading required package: SparseM

Attaching package: 'SparseM'

The following object is masked from 'package:base':

    backsolve


groupGOTerms: 	GOBPTerm, GOMFTerm, GOCCTerm environments built.

Attaching package: 'topGO'

The following object is masked from 'package:IRanges':

    members

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/cellTree/compute.go.enrichment.Rd_%03d_medium.png", width=480, height=480)
> ### Name: compute.go.enrichment
> ### Title: Gene Ontology enrichment analysis
> ### Aliases: compute.go.enrichment
> 
> ### ** Examples
> 
> # Load pre-computed LDA model for skeletal myoblast RNA-Seq data from HSMMSingleCell package:
> data(HSMM_lda_model)
> 
> ## No test: 
> # Load GO mapping database for 'homo sapiens':
> library(org.Hs.eg.db)

> # Compute Cellular Component GO enrichment sets for each topic:
> go.results = compute.go.enrichment(HSMM_lda_model, org.Hs.eg.db, ontology.type="CC", bonferroni.correct=TRUE, p.val.threshold=0.01)
Loading required namespace: maptpx
Computing GO enrichment for topic: 1 

Building most specific GOs .....
	( 1336 GO terms found. )

Build GO DAG topology ..........
	( 1606 GO terms and 3191 relations. )

Annotating nodes ...............
	( 10029 genes annotated to the GO terms. )

			 -- Weight01 Algorithm -- 

		 the algorithm is scoring 728 nontrivial nodes
		 parameters: 
			 test statistic: KS
			 score order: decreasing

	 Level 16:	1 nodes to be scored	(0 eliminated genes)

	 Level 15:	10 nodes to be scored	(0 eliminated genes)

	 Level 14:	35 nodes to be scored	(18 eliminated genes)

	 Level 13:	38 nodes to be scored	(135 eliminated genes)

	 Level 12:	61 nodes to be scored	(681 eliminated genes)

	 Level 11:	97 nodes to be scored	(1310 eliminated genes)

	 Level 10:	90 nodes to be scored	(2495 eliminated genes)

	 Level 9:	70 nodes to be scored	(4891 eliminated genes)

	 Level 8:	74 nodes to be scored	(6004 eliminated genes)

	 Level 7:	49 nodes to be scored	(6370 eliminated genes)

	 Level 6:	56 nodes to be scored	(8726 eliminated genes)

	 Level 5:	54 nodes to be scored	(8780 eliminated genes)

	 Level 4:	56 nodes to be scored	(9324 eliminated genes)

	 Level 3:	24 nodes to be scored	(9753 eliminated genes)

	 Level 2:	12 nodes to be scored	(9884 eliminated genes)

	 Level 1:	1 nodes to be scored	(9890 eliminated genes)
Computing GO enrichment for topic: 2 

Building most specific GOs .....
	( 1336 GO terms found. )

Build GO DAG topology ..........
	( 1606 GO terms and 3191 relations. )

Annotating nodes ...............
	( 10029 genes annotated to the GO terms. )

			 -- Weight01 Algorithm -- 

		 the algorithm is scoring 728 nontrivial nodes
		 parameters: 
			 test statistic: KS
			 score order: decreasing

	 Level 16:	1 nodes to be scored	(0 eliminated genes)

	 Level 15:	10 nodes to be scored	(0 eliminated genes)

	 Level 14:	35 nodes to be scored	(18 eliminated genes)

	 Level 13:	38 nodes to be scored	(135 eliminated genes)

	 Level 12:	61 nodes to be scored	(681 eliminated genes)

	 Level 11:	97 nodes to be scored	(1310 eliminated genes)

	 Level 10:	90 nodes to be scored	(2495 eliminated genes)

	 Level 9:	70 nodes to be scored	(4891 eliminated genes)

	 Level 8:	74 nodes to be scored	(6004 eliminated genes)

	 Level 7:	49 nodes to be scored	(6370 eliminated genes)

	 Level 6:	56 nodes to be scored	(8726 eliminated genes)

	 Level 5:	54 nodes to be scored	(8780 eliminated genes)

	 Level 4:	56 nodes to be scored	(9324 eliminated genes)

	 Level 3:	24 nodes to be scored	(9753 eliminated genes)

	 Level 2:	12 nodes to be scored	(9884 eliminated genes)

	 Level 1:	1 nodes to be scored	(9890 eliminated genes)
Computing GO enrichment for topic: 3 

Building most specific GOs .....
	( 1336 GO terms found. )

Build GO DAG topology ..........
	( 1606 GO terms and 3191 relations. )

Annotating nodes ...............
	( 10029 genes annotated to the GO terms. )

			 -- Weight01 Algorithm -- 

		 the algorithm is scoring 728 nontrivial nodes
		 parameters: 
			 test statistic: KS
			 score order: decreasing

	 Level 16:	1 nodes to be scored	(0 eliminated genes)

	 Level 15:	10 nodes to be scored	(0 eliminated genes)

	 Level 14:	35 nodes to be scored	(18 eliminated genes)

	 Level 13:	38 nodes to be scored	(135 eliminated genes)

	 Level 12:	61 nodes to be scored	(681 eliminated genes)

	 Level 11:	97 nodes to be scored	(1310 eliminated genes)

	 Level 10:	90 nodes to be scored	(2495 eliminated genes)

	 Level 9:	70 nodes to be scored	(4891 eliminated genes)

	 Level 8:	74 nodes to be scored	(6004 eliminated genes)

	 Level 7:	49 nodes to be scored	(6370 eliminated genes)

	 Level 6:	56 nodes to be scored	(8726 eliminated genes)

	 Level 5:	54 nodes to be scored	(8780 eliminated genes)

	 Level 4:	56 nodes to be scored	(9324 eliminated genes)

	 Level 3:	24 nodes to be scored	(9753 eliminated genes)

	 Level 2:	12 nodes to be scored	(9884 eliminated genes)

	 Level 1:	1 nodes to be scored	(9890 eliminated genes)
Computing GO enrichment for topic: 4 

Building most specific GOs .....
	( 1336 GO terms found. )

Build GO DAG topology ..........
	( 1606 GO terms and 3191 relations. )

Annotating nodes ...............
	( 10029 genes annotated to the GO terms. )

			 -- Weight01 Algorithm -- 

		 the algorithm is scoring 728 nontrivial nodes
		 parameters: 
			 test statistic: KS
			 score order: decreasing

	 Level 16:	1 nodes to be scored	(0 eliminated genes)

	 Level 15:	10 nodes to be scored	(0 eliminated genes)

	 Level 14:	35 nodes to be scored	(18 eliminated genes)

	 Level 13:	38 nodes to be scored	(135 eliminated genes)

	 Level 12:	61 nodes to be scored	(681 eliminated genes)

	 Level 11:	97 nodes to be scored	(1310 eliminated genes)

	 Level 10:	90 nodes to be scored	(2495 eliminated genes)

	 Level 9:	70 nodes to be scored	(4891 eliminated genes)

	 Level 8:	74 nodes to be scored	(6004 eliminated genes)

	 Level 7:	49 nodes to be scored	(6370 eliminated genes)

	 Level 6:	56 nodes to be scored	(8726 eliminated genes)

	 Level 5:	54 nodes to be scored	(8780 eliminated genes)

	 Level 4:	56 nodes to be scored	(9324 eliminated genes)

	 Level 3:	24 nodes to be scored	(9753 eliminated genes)

	 Level 2:	12 nodes to be scored	(9884 eliminated genes)

	 Level 1:	1 nodes to be scored	(9890 eliminated genes)
Computing GO enrichment for topic: 5 

Building most specific GOs .....
	( 1336 GO terms found. )

Build GO DAG topology ..........
	( 1606 GO terms and 3191 relations. )

Annotating nodes ...............
	( 10029 genes annotated to the GO terms. )

			 -- Weight01 Algorithm -- 

		 the algorithm is scoring 728 nontrivial nodes
		 parameters: 
			 test statistic: KS
			 score order: decreasing

	 Level 16:	1 nodes to be scored	(0 eliminated genes)

	 Level 15:	10 nodes to be scored	(0 eliminated genes)

	 Level 14:	35 nodes to be scored	(18 eliminated genes)

	 Level 13:	38 nodes to be scored	(135 eliminated genes)

	 Level 12:	61 nodes to be scored	(681 eliminated genes)

	 Level 11:	97 nodes to be scored	(1310 eliminated genes)

	 Level 10:	90 nodes to be scored	(2495 eliminated genes)

	 Level 9:	70 nodes to be scored	(4891 eliminated genes)

	 Level 8:	74 nodes to be scored	(6004 eliminated genes)

	 Level 7:	49 nodes to be scored	(6370 eliminated genes)

	 Level 6:	56 nodes to be scored	(8726 eliminated genes)

	 Level 5:	54 nodes to be scored	(8780 eliminated genes)

	 Level 4:	56 nodes to be scored	(9324 eliminated genes)

	 Level 3:	24 nodes to be scored	(9753 eliminated genes)

	 Level 2:	12 nodes to be scored	(9884 eliminated genes)

	 Level 1:	1 nodes to be scored	(9890 eliminated genes)
> 
> # Print table of terms that are only significantly enriched in each topic: 
> print(go.results$unique)
[[1]]
        GO.ID                                   Term Total p-Value
12 GO:0000777       condensed chromosome kinetochore    89 8.7e-11
16 GO:0005681                   spliceosomal complex   161 1.2e-08
26 GO:0000784   nuclear chromosome, telomeric region   102 1.1e-06
27 GO:0005813                             centrosome   406 1.4e-06
28 GO:0000922                           spindle pole   106 1.5e-06
31 GO:0046540           U4/U6 x U5 tri-snRNP complex    18 3.0e-06
32 GO:0005686                               U2 snRNP    17 3.1e-06
36 GO:0005689          U12-type spliceosomal complex    25 4.7e-06
38 GO:0000785                              chromatin   326 6.0e-06
39 GO:0005876                    spindle microtubule    50 7.7e-06
40 GO:0000940 condensed chromosome outer kinetochore    13 9.3e-06

[[2]]
[1] GO.ID   Term    Total   p-Value
<0 rows> (or 0-length row.names)

[[3]]
        GO.ID                             Term Total p-Value
23 GO:0030018                           Z disc    76 5.6e-07
24 GO:0001725                     stress fiber    39 7.9e-07
31 GO:0000932 cytoplasmic mRNA processing body    59 2.9e-06

[[4]]
        GO.ID              Term Total p-Value
13 GO:0005604 basement membrane    60 1.2e-05

[[5]]
        GO.ID                           Term Total p-Value
29 GO:0005761         mitochondrial ribosome    70 7.9e-06
33 GO:0000139                 Golgi membrane   467 1.1e-05
35 GO:0005789 endoplasmic reticulum membrane   649 1.2e-05
36 GO:0005885         Arp2/3 protein complex    10 1.3e-05
38 GO:0005739                  mitochondrion  1318 1.4e-05

> ## End(No test)
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>