R: Functions which map gene identifiers to GO terms
annFUN
R Documentation
Functions which map gene identifiers to GO terms
Description
These functions are used to compile a list of GO terms such that each
element in the list is a character vector containing all the gene
identifiers that are mapped to the respective GO term.
character string specifying one of the three GO
ontologies, namely: "BP", "MF", "CC"
feasibleGenes
character vector containing a subset of gene
identifiers. Only these genes will be used to annotate GO
terms. Default value is NULL which means that there are no
genes filtered.
affyLib
character string containing the name of the
Bioconductor annotaion package for a specific microarray chip.
gene2GO
named list of character vectors. The list names are
genes identifiers. For each gene the character vector contains the
GO identifiers it maps to. Only the most specific annotations are required.
GO2genes
named list of character vectors. The list names are
GO identifiers. For each GO the character vector contains the
genes identifiers which are mapped to it. Only the most specific
annotations are required.
mapping
character string specifieng the name of the
Bioconductor package containing the gene mappings for a
specific organism. For example: mapping = "org.Hs.eg.db".
ID
character string specifing the gene identifier to
use. Currently only the following identifiers can be used:
c("entrez", "genbank", "alias", "ensembl", "symbol",
"genename", "unigene")
file
character string specifing the file containing the annotations.
...
other parameters
sep
the character used to separate the columns in the CSV file
IDsep
the character used to separate the annotated entities
l
a list containing mappings
Details
All these function restrict the GO terms to the ones belonging
to the specified ontology and to the genes listed in the
feasibleGenes attribute (if not empty).
The function annFUN.db uses the mappings provided
in the Bioconductor annotation data packages. For example, if the
Affymetrix hgu133a chip it is used, then the user should set
affyLib = "hgu133a.db".
The functions annFUN.gene2GO and annFUN.GO2genes are
used when the user provide his own annotations either as a gene-to-GOs
mapping, either as a GO-to-genes mapping.
The annFUN.org function is using the mappings from the
"org.XX.XX" annotation packages. The function supports different gene
identifiers.
The annFUN.file function will read the annotationsof the type
gene2GO or GO2genes from a text file.
Value
A named(GO identifiers) list of character vectors.
Author(s)
Adrian Alexa
See Also
topGOdata-class
Examples
library(hgu133a.db)
set.seed(111)
## generate a gene list and the GO annotations
selGenes <- sample(ls(hgu133aGO), 50)
gene2GO <- lapply(mget(selGenes, envir = hgu133aGO), names)
gene2GO[sapply(gene2GO, is.null)] <- NA
## the annotation for the first three genes
gene2GO[1:3]
## inverting the annotations
G2g <- inverseList(gene2GO)
## inverting the annotations and selecting an ontology
go2genes <- annFUN.gene2GO(whichOnto = "CC", gene2GO = gene2GO)
## generate a GO list with the genes annotations
selGO <- sample(ls(hgu133aGO2PROBE), 30)
GO2gene <- lapply(mget(selGO, envir = hgu133aGO2PROBE), as.character)
GO2gene[1:3]
## select only the GO terms for a specific ontology
go2gene <- annFUN.GO2genes(whichOnto = "CC", GO2gene = GO2gene)
##################################################
## Using the org.XX.xx.db annotations
##################################################
## GO to Symbol mappings (only the BP ontology is used)
xx <- annFUN.org("BP", mapping = "org.Hs.eg.db", ID = "symbol")
head(xx)
## Not run:
allGenes <- unique(unlist(xx))
myInterestedGenes <- sample(allGenes, 500)
geneList <- factor(as.integer(allGenes
names(geneList) <- allGenes
GOdata <- new("topGOdata",
ontology = "BP",
allGenes = geneList,
nodeSize = 5,
annot = annFUN.org,
mapping = "org.Hs.eg.db",
ID = "symbol")
## End(Not run)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(topGO)
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: graph
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: GO.db
Loading required package: AnnotationDbi
Loading required package: stats4
Loading required package: IRanges
Loading required package: S4Vectors
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: SparseM
Attaching package: 'SparseM'
The following object is masked from 'package:base':
backsolve
groupGOTerms: GOBPTerm, GOMFTerm, GOCCTerm environments built.
Attaching package: 'topGO'
The following object is masked from 'package:IRanges':
members
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/topGO/annFUN.Rd_%03d_medium.png", width=480, height=480)
> ### Name: annFUN
> ### Title: Functions which map gene identifiers to GO terms
> ### Aliases: annFUN.db annFUN annFUN.GO2genes annFUN.gene2GO annFUN.file
> ### annFUN.org inverseList readMappings
> ### Keywords: misc
>
> ### ** Examples
>
>
> library(hgu133a.db)
Loading required package: org.Hs.eg.db
> set.seed(111)
>
> ## generate a gene list and the GO annotations
> selGenes <- sample(ls(hgu133aGO), 50)
> gene2GO <- lapply(mget(selGenes, envir = hgu133aGO), names)
> gene2GO[sapply(gene2GO, is.null)] <- NA
>
> ## the annotation for the first three genes
> gene2GO[1:3]
$`213834_at`
[1] "GO:0030036" "GO:0032012" "GO:0043547" "GO:0005737" "GO:0045211"
[6] "GO:0060077" "GO:0005086"
$`216818_s_at`
[1] "GO:0007186" "GO:0007608" "GO:0050911" "GO:0050911" "GO:0005886"
[6] "GO:0016021" "GO:0004930" "GO:0004984"
$`208759_at`
[1] "GO:0002262" "GO:0006508" "GO:0006509" "GO:0006509" "GO:0007219"
[6] "GO:0007220" "GO:0007411" "GO:0016485" "GO:0016485" "GO:0022617"
[11] "GO:0030198" "GO:0031293" "GO:0042098" "GO:0042987" "GO:0043065"
[16] "GO:0043085" "GO:0043085" "GO:0048011" "GO:0048013" "GO:0050435"
[21] "GO:0050673" "GO:0097190" "GO:0005765" "GO:0005783" "GO:0005783"
[26] "GO:0005794" "GO:0005794" "GO:0005886" "GO:0005887" "GO:0005887"
[31] "GO:0005925" "GO:0016020" "GO:0016021" "GO:0042470" "GO:0070062"
[36] "GO:0004175" "GO:0005515"
>
> ## inverting the annotations
> G2g <- inverseList(gene2GO)
>
> ## inverting the annotations and selecting an ontology
> go2genes <- annFUN.gene2GO(whichOnto = "CC", gene2GO = gene2GO)
>
>
> ## generate a GO list with the genes annotations
> selGO <- sample(ls(hgu133aGO2PROBE), 30)
> GO2gene <- lapply(mget(selGO, envir = hgu133aGO2PROBE), as.character)
>
> GO2gene[1:3]
$`GO:0055001`
[1] "205897_at" "213345_at" "210329_s_at" "210330_at" "213543_at"
[6] "214492_at" "207302_at" "212849_at" "212348_s_at"
$`GO:0043931`
[1] "200059_s_at" "215668_s_at" "215807_s_at" "205485_at" "209561_at"
[6] "218892_at" "222101_s_at" "203528_at" "219427_at"
$`GO:0001743`
[1] "212849_at"
>
> ## select only the GO terms for a specific ontology
> go2gene <- annFUN.GO2genes(whichOnto = "CC", GO2gene = GO2gene)
>
>
> ##################################################
> ## Using the org.XX.xx.db annotations
> ##################################################
>
> ## GO to Symbol mappings (only the BP ontology is used)
> xx <- annFUN.org("BP", mapping = "org.Hs.eg.db", ID = "symbol")
> head(xx)
$`GO:0000002`
[1] "SLC25A4" "TYMP" "MEF2A" "MPV17" "OPA1" "LONP1"
[7] "AKT3" "SLC25A36" "MRPL17" "PIF1" "SLC25A33" "MGME1"
$`GO:0000003`
[1] "MMP23B" "WDR43"
$`GO:0000011`
[1] "RBSN"
$`GO:0000012`
[1] "LIG4" "TNP1" "XRCC1" "SIRT1" "APTX"
[6] "TDP1" "APLF" "LOC100133315"
$`GO:0000018`
[1] "IL7R" "KPNA1" "KPNA2" "THOC1" "ALYREF" "SMARCAD1"
$`GO:0000019`
[1] "MRE11A" "RAD50"
>
> ## Not run:
> ##D
> ##D allGenes <- unique(unlist(xx))
> ##D myInterestedGenes <- sample(allGenes, 500)
> ##D geneList <- factor(as.integer(allGenes ##D
> ##D names(geneList) <- allGenes
> ##D
> ##D GOdata <- new("topGOdata",
> ##D ontology = "BP",
> ##D allGenes = geneList,
> ##D nodeSize = 5,
> ##D annot = annFUN.org,
> ##D mapping = "org.Hs.eg.db",
> ##D ID = "symbol")
> ## End(Not run)
>
>
>
>
>
>
> dev.off()
null device
1
>