Last data update: 2014.03.03

R: Adds genomic information to variants
annotateVariantsR Documentation

Adds genomic information to variants

Description

This method annotates given genomic variants (mutations). Annotation includes affected genes, exons and codons. Resulting amino acid changes are returned as well as dbSNP identifiers, if the mutation is already known. All information is fetched from Ensembl via biomaRt using the datasets hsapiens_gene_ensembl and hsapiens_snp.

Usage

annotateVariants(object, bsGenome)

Arguments

object

A data frame storing variants or an instance of AVASet/MapperSet or a data frame (see details).

bsGenome

An object of class BSGenome giving the genome to be used as reference sequence to calculate amino acid changes. This argument is only applicable when object is of type MapperSet. Default is ‘BSgenome.Hsapiens.UCSC.hg19’. Note that the genome should fit to the Ensembl annotation.

Details

If a data frame is given, the following columns must be present:

start genomic start position in the current Ensembl genome
end genomic end position in the current Ensembl genome
chromosome chromosome in ensembl notation (i.e. "1", "2", ..., "Y")
strand "+" or "-" relative to the nucleotide bases given below
seqRef reference sequence
seqMut sequence of the observed variant
seqSur reference sequence extended for 3 bases in both directions

The rownames of the data frame are used as mutations' names (IDs). See examples for a properly defined data drame.

Value

An object of class AnnotatedVariants. Affected genes, transcripts and exon as well as known SNPs are stored in a list-like structure. See the documentation of class AnnotatedVariants-class for details.

Author(s)

Hans-Ulrich Klein

See Also

AnnotatedVariants-class, AVASet-class, MapperSet-class, htmlReport

Examples

variants = data.frame(
    start=c(106157528, 106154991,106156184),
    end=c(106157528, 106154994,106156185),
    chromosome=c("4", "4", "4"),
    strand=c("+", "+", "+"),
    seqRef=c("A", "ATAG", "---"),
    seqMut=c("G", "----", "ATA"),
    seqSur=c("TACAGAA", "TTTATAGATA", "AGC---TCC"),
    stringsAsFactors=FALSE)
rownames(variants) = c("snp", "del", "ins")
## Not run: annotateVariants(variants)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(R453Plus1Toolbox)
Loading required package: VariantAnnotation
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: GenomeInfoDb
Loading required package: stats4
Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: Rsamtools
Loading required package: Biostrings
Loading required package: XVector

Attaching package: 'VariantAnnotation'

The following object is masked from 'package:base':

    tabulate

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/R453Plus1Toolbox/annotateVariants.Rd_%03d_medium.png", width=480, height=480)
> ### Name: annotateVariants
> ### Title: Adds genomic information to variants
> ### Aliases: annotateVariants annotateVariants,AVASet,missing-method
> ###   annotateVariants,MapperSet,BSgenome-method
> ###   annotateVariants,MapperSet,missing-method
> ###   annotateVariants,data.frame,missing-method
> ### Keywords: annotateVariants
> 
> ### ** Examples
> 
> variants = data.frame(
+     start=c(106157528, 106154991,106156184),
+     end=c(106157528, 106154994,106156185),
+     chromosome=c("4", "4", "4"),
+     strand=c("+", "+", "+"),
+     seqRef=c("A", "ATAG", "---"),
+     seqMut=c("G", "----", "ATA"),
+     seqSur=c("TACAGAA", "TTTATAGATA", "AGC---TCC"),
+     stringsAsFactors=FALSE)
> rownames(variants) = c("snp", "del", "ins")
> ## Not run: annotateVariants(variants)
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>