Last data update: 2014.03.03

R: Add common IDs to annotated peaks such as gene symbol, entrez...
addGeneIDsR Documentation

Add common IDs to annotated peaks such as gene symbol, entrez ID, ensemble gene id and refseq id.

Description

Add common IDs to annotated peaks such as gene symbol, entrez ID, ensemble gene id and refseq id leveraging organism annotation dataset. For example, org.Hs.eg.db is the dataset from orgs.Hs.eg.db package for human, while org.Mm.eg.db is the dataset from the org.Mm.eg.db package for mouse

Usage

addGeneIDs(annotatedPeak, orgAnn, IDs2Add=c("symbol"), 
           feature_id_type="ensembl_gene_id", silence=TRUE, mart)

Arguments

annotatedPeak

GRanges or a vector of feature IDs

orgAnn

organism annotation dataset such as org.Hs.eg.db

IDs2Add

a vector of annotation identifiers to be added

feature_id_type

type of ID to be annotated, default is ensembl_gene_id

silence

TRUE or FALSE. If TRUE, will not show unmapped entrez id for feature ids.

mart

mart object, see useMart of biomaRt package for details

Details

One of orgAnn and mart should be assigned.

  • If orgAnn is given, parameter feature_id_type should be ensemble_gene_id, entrez_id, gene_symbol, gene_alias or refseq_id. And parameter IDs2Add can be set to any combination of identifiers such as "accnum", "ensembl", "ensemblprot", "ensembltrans", "entrez_id", "enzyme", "genename", "pfam", "pmid", "prosite", "refseq", "symbol", "unigene" and "uniprot". Some IDs are unique to an organism, such as "omim" for org.Hs.eg.db and "mgi" for org.Mm.eg.db.

    Here is the definition of different IDs :

    • accnum: GenBank accession numbers

    • ensembl: Ensembl gene accession numbers

    • ensemblprot: Ensembl protein accession numbers

    • ensembltrans: Ensembl transcript accession numbers

    • entrez_id: entrez gene identifiers

    • enzyme: EC numbers

    • genename: gene name

    • pfam: Pfam identifiers

    • pmid: PubMed identifiers

    • prosite: PROSITE identifiers

    • refseq: RefSeq identifiers

    • symbol: gene abbreviations

    • unigene: UniGene cluster identifiers

    • uniprot: Uniprot accession numbers

    • omim: OMIM(Mendelian Inheritance in Man) identifiers

    • mgi: Jackson Laboratory MGI gene accession numbers

  • If mart is used instead of orgAnn, for valid parameter feature_id_type and IDs2Add parameters, please refer to getBM in bioMart package. Parameter feature_id_type should be one valid filter name listed by listFilters(mart) such as ensemble_gene_id. And parameter IDs2Add should be one or more valid attributes name listed by listAttributes(mart) such as external_gene_id, entrezgene, wikigene_name, or mirbase_transcript_name.

Value

GRanges if the input is a GRanges or dataframe if input is a vector.

Author(s)

Jianhong Ou, Lihua Julie Zhu

References

http://www.bioconductor.org/packages/release/data/annotation/

See Also

getBM, AnnotationDbi

Examples

 data(annotatedPeak)
 library(org.Hs.eg.db)
 addGeneIDs(annotatedPeak[1:6,],orgAnn="org.Hs.eg.db",
           IDs2Add=c("symbol","omim"))
 ##addGeneIDs(annotatedPeak$feature[1:6],orgAnn="org.Hs.eg.db",
 ##           IDs2Add=c("symbol","genename"))
 if(interactive()){
   mart <- useMart("ENSEMBL_MART_ENSEMBL",host="www.ensembl.org",
                dataset="hsapiens_gene_ensembl")
   ##mart <- useMart(biomart="ensembl",dataset="hsapiens_gene_ensembl")
  addGeneIDs(annotatedPeak[1:6,], mart=mart,
             IDs2Add=c("hgnc_symbol","entrezgene"))
 }

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(ChIPpeakAnno)
Loading required package: grid
Loading required package: IRanges
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: Biostrings
Loading required package: XVector
Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: VennDiagram
Loading required package: futile.logger

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/ChIPpeakAnno/addGeneIDs.Rd_%03d_medium.png", width=480, height=480)
> ### Name: addGeneIDs
> ### Title: Add common IDs to annotated peaks such as gene symbol, entrez
> ###   ID, ensemble gene id and refseq id.
> ### Aliases: addGeneIDs
> ### Keywords: misc
> 
> ### ** Examples
> 
>  data(annotatedPeak)
>  library(org.Hs.eg.db)
Loading required package: AnnotationDbi
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.


>  addGeneIDs(annotatedPeak[1:6,],orgAnn="org.Hs.eg.db",
+            IDs2Add=c("symbol","omim"))
GRanges object with 6 ranges and 10 metadata columns:
                                  seqnames                 ranges strand |
                                     <Rle>              <IRanges>  <Rle> |
  X1_11_100272487.ENSG00000202254        1 [100272801, 100272900]      + |
  X1_11_108905539.ENSG00000186086        1 [108906026, 108906125]      + |
  X1_11_110106925.ENSG00000065135        1 [110107267, 110107366]      + |
  X1_11_110679983.ENSG00000197106        1 [110680469, 110680568]      + |
  X1_11_110681677.ENSG00000197106        1 [110682125, 110682224]      + |
  X1_11_110756560.ENSG00000116396        1 [110756823, 110756922]      + |
                                            peak         feature start_position
                                     <character>     <character>      <numeric>
  X1_11_100272487.ENSG00000202254 1_11_100272487 ENSG00000202254      100257218
  X1_11_108905539.ENSG00000186086 1_11_108905539 ENSG00000186086      108918435
  X1_11_110106925.ENSG00000065135 1_11_110106925 ENSG00000065135      110091233
  X1_11_110679983.ENSG00000197106 1_11_110679983 ENSG00000197106      110693108
  X1_11_110681677.ENSG00000197106 1_11_110681677 ENSG00000197106      110693108
  X1_11_110756560.ENSG00000116396 1_11_110756560 ENSG00000116396      110753965
                                  end_position insideFeature distancetoFeature
                                     <numeric>   <character>         <numeric>
  X1_11_100272487.ENSG00000202254    100257309    downstream             15582
  X1_11_108905539.ENSG00000186086    109013624      upstream            -12410
  X1_11_110106925.ENSG00000065135    110136975        inside             16033
  X1_11_110679983.ENSG00000197106    110744824      upstream            -12640
  X1_11_110681677.ENSG00000197106    110744824      upstream            -10984
  X1_11_110756560.ENSG00000116396    110776666        inside              2857
                                  shortestDistance fromOverlappingOrNearest
                                         <numeric>              <character>
  X1_11_100272487.ENSG00000202254            15491             NearestStart
  X1_11_108905539.ENSG00000186086            12310             NearestStart
  X1_11_110106925.ENSG00000065135            16033             NearestStart
  X1_11_110679983.ENSG00000197106            12540             NearestStart
  X1_11_110681677.ENSG00000197106            10884             NearestStart
  X1_11_110756560.ENSG00000116396             2857             NearestStart
                                       symbol          omim
                                  <character>   <character>
  X1_11_100272487.ENSG00000202254        <NA>          <NA>
  X1_11_108905539.ENSG00000186086       NBPF6        613996
  X1_11_110106925.ENSG00000065135       GNAI3 139370;602483
  X1_11_110679983.ENSG00000197106     SLC6A17 610299;616269
  X1_11_110681677.ENSG00000197106     SLC6A17 610299;616269
  X1_11_110756560.ENSG00000116396       KCNC4        176265
  -------
  seqinfo: 24 sequences from an unspecified genome; no seqlengths
>  ##addGeneIDs(annotatedPeak$feature[1:6],orgAnn="org.Hs.eg.db",
>  ##           IDs2Add=c("symbol","genename"))
> # if(interactive()){
>    mart <- useMart("ENSEMBL_MART_ENSEMBL",host="www.ensembl.org",
+                 dataset="hsapiens_gene_ensembl")
Error: could not find function "useMart"
Execution halted