Last data update: 2014.03.03

R: PROVEANDb objects
PROVEANDb-classR Documentation

PROVEANDb objects

Description

The PROVEANDb class is a container for storing a connection to a PROVEAN sqlite database.

Details

The SIFT tool is no longer actively maintained. A few of the orginal authors have started the PROVEAN (Protein Variation Effect Analyzer) project. PROVEAN is a software tool which predicts whether an amino acid substitution or indel has an impact on the biological function of a protein. PROVEAN is useful for filtering sequence variants to identify nonsynonymous or indel variants that are predicted to be functionally important.

See the web pages for a complete description of the methods.

Though SIFT is not under active development, the PROVEAN team still provids the SIFT scores in the pre-computed downloads. This package, SIFT.Hsapiens.dbSNP137, contains both SIFT and PROVEAN scores. One notable difference between this and the previous SIFT database package is that keys in SIFT.Hsapiens.dbSNP132 are rs IDs whereas in SIFT.Hsapiens.dbSNP137 they are NCBI dbSNP IDs.

Methods

In the code below, x is a PROVEANDb object.

metadata(x): Returns x's metadata in a data frame.

columns(x): Returns the names of the columns that can be used to subset the data columns.

keys(x, keytype="DBSNPID", ...): Returns the names of the keys that can be used to subset the data rows. For SIFT.Hsapiens.dbSNP137 the keys are NCBI dbSNP ids.

keytypes(x): Returns the names of the columns that can be used as keys. For SIFT.Hsapiens.dbSNP137 the NCBI dbSNP ids are the only keytype.

select(x, keys = NULL, columns = NULL, keytype = "DBSNPID", ...): Returns a subset of data defined by the character vectors keys and columns. If no keys are supplied, all rows are returned. If no columns are supplied, all columns are returned.

Author(s)

Valerie Obenchain

References

The PROVEAN tool has replaced SIFT: http://provean.jcvi.org/about.php

Choi Y, Sims GE, Murphy S, Miller JR, Chan AP (2012) Predicting the Functional Effect of Amino Acid Substitutions and Indels. PLoS ONE 7(10): e46688.

Choi Y (2012) A Fast Computation of Pairwise Sequence Alignment Scores Between a Protein and a Set of Single-Locus Variants of Another Protein. In Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine (BCB '12). ACM, New York, NY, USA, 414-417.

Kumar P, Henikoff S, Ng PC. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc. 2009;4(7):1073-81

Ng PC, Henikoff S. Predicting the Effects of Amino Acid Substitutions on Protein Function Annu Rev Genomics Hum Genet. 2006;7:61-80.

Ng PC, Henikoff S. SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 2003 Jul 1;31(13):3812-4.

Examples

  if (require(SIFT.Hsapiens.dbSNP137)) {
      ## metadata
      metadata(SIFT.Hsapiens.dbSNP137)

      ## keys are the DBSNPID (NCBI dbSNP ID)
      dbsnp <- keys(SIFT.Hsapiens.dbSNP137)
      head(dbsnp)
      columns(SIFT.Hsapiens.dbSNP137)

      ## Return all columns. Note that the key, DBSNPID,
      ## is always returned. 
      select(SIFT.Hsapiens.dbSNP137, dbsnp[10])
      ## subset on keys and cols 
      cols <- c("VARIANT", "PROVEANPRED", "SIFTPRED")
      select(SIFT.Hsapiens.dbSNP137, dbsnp[20:23], cols)
  }

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(VariantAnnotation)
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: GenomeInfoDb
Loading required package: stats4
Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: Rsamtools
Loading required package: Biostrings
Loading required package: XVector

Attaching package: 'VariantAnnotation'

The following object is masked from 'package:base':

    tabulate

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/VariantAnnotation/PROVEANDb-class.Rd_%03d_medium.png", width=480, height=480)
> ### Name: PROVEANDb-class
> ### Title: PROVEANDb objects
> ### Aliases: PROVEAN PROVEANDb class:PROVEANDb PROVEANDb-class
> ###   columns,PROVEANDb-method keys,PROVEANDb-method
> ###   keytypes,PROVEANDb-method select,PROVEANDb-method
> ### Keywords: classes methods
> 
> ### ** Examples
> 
>   if (require(SIFT.Hsapiens.dbSNP137)) {
+       ## metadata
+       metadata(SIFT.Hsapiens.dbSNP137)
+ 
+       ## keys are the DBSNPID (NCBI dbSNP ID)
+       dbsnp <- keys(SIFT.Hsapiens.dbSNP137)
+       head(dbsnp)
+       columns(SIFT.Hsapiens.dbSNP137)
+ 
+       ## Return all columns. Note that the key, DBSNPID,
+       ## is always returned. 
+       select(SIFT.Hsapiens.dbSNP137, dbsnp[10])
+       ## subset on keys and cols 
+       cols <- c("VARIANT", "PROVEANPRED", "SIFTPRED")
+       select(SIFT.Hsapiens.dbSNP137, dbsnp[20:23], cols)
+   }
Loading required package: SIFT.Hsapiens.dbSNP137
Loading required package: RSQLite
Loading required package: DBI
   DBSNPID         VARIANT PROVEANPRED  SIFTPRED
1 10004242 4,159782570,G,A Deleterious  Damaging
2 10004516 4,111398208,A,G     Neutral Tolerated
3 10005030  4,77317626,T,C     Neutral Tolerated
4 10005739  4,16020122,T,G Deleterious  Damaging
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>