Last data update: 2014.03.03

R: Get gene length and GC-content
getGeneLengthAndGCContentR Documentation

Get gene length and GC-content

Description

Automatically retrieves gene length and GC-content information from Biomart or org.db packages.

Usage

getGeneLengthAndGCContent(id, org, mode=c("biomart", "org.db"))

Arguments

id

Character vector of one or more ENSEMBL or ENTREZ gene IDs.

org

Organism three letter code, e.g. 'hsa' for 'Homo sapiens'. See also: http://www.genome.jp/kegg/catalog/org_list.html; In org.db mode, this can be also a specific genome assembly, e.g. 'hg38' or 'sacCer3'.

mode

Mode to retrieve the information. Defaults to 'biomart'. See Details.

Details

The 'biomart' mode is based on functionality from the biomaRt packgage and retrieves the required information from the BioMart database. This is available for all ENSEMBL organisms and is typically most current, but can be time-consuming when querying several thousand genes at a time.

The 'org.db' mode uses organism-based annotation packages from Bioconductor. This is much faster than the 'biomart' mode, but is only available for selected model organism currently supported by BioC annotation functionality.

Results for the same gene ID(s) can differ between both modes as they are based on different sources for the underlying genome assembly. While the 'biomart' mode uses the latest ENSEMBL version, the 'org.db' mode uses BioC annotation packages typically built from UCSC.

Value

A numeric matrix with two columns: gene length and GC-content.

Author(s)

Ludwig Geistlinger <Ludwig.Geistlinger@bio.ifi.lmu.de>

See Also

getSequence to retrieve a genomic sequence from BioMart, genes to extract genomic coordinates from a TxDb object, getSeq to extract genomic sequences from a BSgenome object, alphabetFrequency to calculate nucleotide frequencies.

Examples

getGeneLengthAndGCContent("ENSG00000012048", "hsa")

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(EDASeq)
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: ShortRead
Loading required package: BiocParallel
Loading required package: Biostrings
Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: XVector
Loading required package: Rsamtools
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: GenomicAlignments
Loading required package: SummarizedExperiment
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/EDASeq/getGeneLengthAndGCContent.Rd_%03d_medium.png", width=480, height=480)
> ### Name: getGeneLengthAndGCContent
> ### Title: Get gene length and GC-content
> ### Aliases: getGeneLengthAndGCContent
> 
> ### ** Examples
> 
> getGeneLengthAndGCContent("ENSG00000012048", "hsa")
Connecting to BioMart ...
Downloading sequence ...
      length           gc 
8922.0000000    0.4588657 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>