Last data update: 2014.03.03
R: Add common IDs to annotated peaks such as gene symbol, entrez...
addGeneIDs R Documentation
Add common IDs to annotated peaks such as gene symbol, entrez ID, ensemble gene
id and refseq id.
Description
Add common IDs to annotated peaks such as gene symbol, entrez ID, ensemble gene
id and refseq id leveraging organism annotation dataset. For example,
org.Hs.eg.db is the dataset from orgs.Hs.eg.db package for human, while
org.Mm.eg.db is the dataset from the org.Mm.eg.db package for mouse
Usage
addGeneIDs(annotatedPeak, orgAnn, IDs2Add=c("symbol"),
feature_id_type="ensembl_gene_id", silence=TRUE, mart)
Arguments
annotatedPeak
GRanges or a vector of feature IDs
orgAnn
organism annotation dataset such as org.Hs.eg.db
IDs2Add
a vector of annotation identifiers to be added
feature_id_type
type of ID to be annotated, default is
ensembl_gene_id
silence
TRUE or FALSE. If TRUE, will not show unmapped entrez id
for feature ids.
mart
mart object, see useMart of biomaRt
package for details
Details
One of orgAnn and mart should be assigned.
If orgAnn is given, parameter feature_id_type should be
ensemble_gene_id, entrez_id, gene_symbol, gene_alias or refseq_id.
And parameter IDs2Add can be set to any combination of identifiers
such as "accnum", "ensembl", "ensemblprot", "ensembltrans", "entrez_id",
"enzyme", "genename", "pfam", "pmid", "prosite", "refseq", "symbol",
"unigene" and "uniprot". Some IDs are unique to an organism,
such as "omim" for org.Hs.eg.db and "mgi" for org.Mm.eg.db.
Here is the definition of different IDs :
accnum: GenBank accession numbers
ensembl: Ensembl gene accession numbers
ensemblprot: Ensembl protein accession numbers
ensembltrans: Ensembl transcript accession numbers
entrez_id: entrez gene identifiers
enzyme: EC numbers
genename: gene name
pfam: Pfam identifiers
pmid: PubMed identifiers
prosite: PROSITE identifiers
refseq: RefSeq identifiers
symbol: gene abbreviations
unigene: UniGene cluster identifiers
uniprot: Uniprot accession numbers
omim: OMIM(Mendelian Inheritance in Man) identifiers
mgi: Jackson Laboratory MGI gene accession numbers
If mart is used instead of orgAnn, for valid parameter
feature_id_type and IDs2Add parameters, please refer to
getBM in bioMart package.
Parameter feature_id_type should be one valid filter name listed by
listFilters(mart) such as ensemble_gene_id.
And parameter IDs2Add should be one or more valid attributes name listed
by listAttributes(mart) such as
external_gene_id, entrezgene, wikigene_name, or mirbase_transcript_name.
Value
GRanges if the input is a GRanges or dataframe if input is a vector.
Author(s)
Jianhong Ou, Lihua Julie Zhu
References
http://www.bioconductor.org/packages/release/data/annotation/
See Also
getBM, AnnotationDbi
Examples
data(annotatedPeak)
library(org.Hs.eg.db)
addGeneIDs(annotatedPeak[1:6,],orgAnn="org.Hs.eg.db",
IDs2Add=c("symbol","omim"))
##addGeneIDs(annotatedPeak$feature[1:6],orgAnn="org.Hs.eg.db",
## IDs2Add=c("symbol","genename"))
if(interactive()){
mart <- useMart("ENSEMBL_MART_ENSEMBL",host="www.ensembl.org",
dataset="hsapiens_gene_ensembl")
##mart <- useMart(biomart="ensembl",dataset="hsapiens_gene_ensembl")
addGeneIDs(annotatedPeak[1:6,], mart=mart,
IDs2Add=c("hgnc_symbol","entrezgene"))
}
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(ChIPpeakAnno)
Loading required package: grid
Loading required package: IRanges
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: Biostrings
Loading required package: XVector
Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: VennDiagram
Loading required package: futile.logger
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/ChIPpeakAnno/addGeneIDs.Rd_%03d_medium.png", width=480, height=480)
> ### Name: addGeneIDs
> ### Title: Add common IDs to annotated peaks such as gene symbol, entrez
> ### ID, ensemble gene id and refseq id.
> ### Aliases: addGeneIDs
> ### Keywords: misc
>
> ### ** Examples
>
> data(annotatedPeak)
> library(org.Hs.eg.db)
Loading required package: AnnotationDbi
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
> addGeneIDs(annotatedPeak[1:6,],orgAnn="org.Hs.eg.db",
+ IDs2Add=c("symbol","omim"))
GRanges object with 6 ranges and 10 metadata columns:
seqnames ranges strand |
<Rle> <IRanges> <Rle> |
X1_11_100272487.ENSG00000202254 1 [100272801, 100272900] + |
X1_11_108905539.ENSG00000186086 1 [108906026, 108906125] + |
X1_11_110106925.ENSG00000065135 1 [110107267, 110107366] + |
X1_11_110679983.ENSG00000197106 1 [110680469, 110680568] + |
X1_11_110681677.ENSG00000197106 1 [110682125, 110682224] + |
X1_11_110756560.ENSG00000116396 1 [110756823, 110756922] + |
peak feature start_position
<character> <character> <numeric>
X1_11_100272487.ENSG00000202254 1_11_100272487 ENSG00000202254 100257218
X1_11_108905539.ENSG00000186086 1_11_108905539 ENSG00000186086 108918435
X1_11_110106925.ENSG00000065135 1_11_110106925 ENSG00000065135 110091233
X1_11_110679983.ENSG00000197106 1_11_110679983 ENSG00000197106 110693108
X1_11_110681677.ENSG00000197106 1_11_110681677 ENSG00000197106 110693108
X1_11_110756560.ENSG00000116396 1_11_110756560 ENSG00000116396 110753965
end_position insideFeature distancetoFeature
<numeric> <character> <numeric>
X1_11_100272487.ENSG00000202254 100257309 downstream 15582
X1_11_108905539.ENSG00000186086 109013624 upstream -12410
X1_11_110106925.ENSG00000065135 110136975 inside 16033
X1_11_110679983.ENSG00000197106 110744824 upstream -12640
X1_11_110681677.ENSG00000197106 110744824 upstream -10984
X1_11_110756560.ENSG00000116396 110776666 inside 2857
shortestDistance fromOverlappingOrNearest
<numeric> <character>
X1_11_100272487.ENSG00000202254 15491 NearestStart
X1_11_108905539.ENSG00000186086 12310 NearestStart
X1_11_110106925.ENSG00000065135 16033 NearestStart
X1_11_110679983.ENSG00000197106 12540 NearestStart
X1_11_110681677.ENSG00000197106 10884 NearestStart
X1_11_110756560.ENSG00000116396 2857 NearestStart
symbol omim
<character> <character>
X1_11_100272487.ENSG00000202254 <NA> <NA>
X1_11_108905539.ENSG00000186086 NBPF6 613996
X1_11_110106925.ENSG00000065135 GNAI3 139370;602483
X1_11_110679983.ENSG00000197106 SLC6A17 610299;616269
X1_11_110681677.ENSG00000197106 SLC6A17 610299;616269
X1_11_110756560.ENSG00000116396 KCNC4 176265
-------
seqinfo: 24 sequences from an unspecified genome; no seqlengths
> ##addGeneIDs(annotatedPeak$feature[1:6],orgAnn="org.Hs.eg.db",
> ## IDs2Add=c("symbol","genename"))
> # if(interactive()){
> mart <- useMart("ENSEMBL_MART_ENSEMBL",host="www.ensembl.org",
+ dataset="hsapiens_gene_ensembl")
Error: could not find function "useMart"
Execution halted