The PROVEANDb class is a container for storing a connection to a PROVEAN
sqlite database.
Details
The SIFT tool is no longer actively maintained. A few of the
orginal authors have started the PROVEAN (Protein Variation
Effect Analyzer) project. PROVEAN is a software tool which predicts
whether an amino acid substitution or indel has an impact on the
biological function of a protein. PROVEAN is useful for filtering
sequence variants to identify nonsynonymous or indel variants that
are predicted to be functionally important.
See the web pages for a complete description of the methods.
Though SIFT is not under active development, the PROVEAN team still
provids the SIFT scores in the pre-computed downloads. This package,
SIFT.Hsapiens.dbSNP137, contains both SIFT and PROVEAN scores.
One notable difference between this and the previous SIFT database
package is that keys in SIFT.Hsapiens.dbSNP132 are
rs IDs whereas in SIFT.Hsapiens.dbSNP137 they are NCBI dbSNP IDs.
Methods
In the code below, x is a PROVEANDb object.
metadata(x):
Returns x's metadata in a data frame.
columns(x):
Returns the names of the columns that can be used to subset the
data columns.
keys(x, keytype="DBSNPID", ...):
Returns the names of the keys that can be used to subset the
data rows. For SIFT.Hsapiens.dbSNP137 the keys are NCBI dbSNP ids.
keytypes(x):
Returns the names of the columns that can be used as keys.
For SIFT.Hsapiens.dbSNP137 the NCBI dbSNP ids are the only keytype.
select(x, keys = NULL, columns = NULL, keytype = "DBSNPID", ...):
Returns a subset of data defined by the character vectors keys
and columns. If no keys are supplied, all rows are
returned. If no columns are supplied, all columns
are returned.
Choi Y, Sims GE, Murphy S, Miller JR, Chan AP (2012) Predicting the
Functional Effect of Amino Acid Substitutions and Indels.
PLoS ONE 7(10): e46688.
Choi Y (2012) A Fast Computation of Pairwise Sequence Alignment Scores
Between a Protein and a Set of Single-Locus Variants of Another Protein.
In Proceedings of the ACM Conference on Bioinformatics,
Computational Biology and Biomedicine (BCB '12). ACM, New York, NY, USA,
414-417.
Kumar P, Henikoff S, Ng PC. Predicting the effects of coding non-synonymous
variants on protein function using the SIFT algorithm. Nat Protoc.
2009;4(7):1073-81
Ng PC, Henikoff S. Predicting the Effects of Amino Acid Substitutions on
Protein Function Annu Rev Genomics Hum Genet. 2006;7:61-80.
Ng PC, Henikoff S. SIFT: predicting amino acid changes that affect protein
function. Nucleic Acids Res. 2003 Jul 1;31(13):3812-4.
Examples
if (require(SIFT.Hsapiens.dbSNP137)) {
## metadata
metadata(SIFT.Hsapiens.dbSNP137)
## keys are the DBSNPID (NCBI dbSNP ID)
dbsnp <- keys(SIFT.Hsapiens.dbSNP137)
head(dbsnp)
columns(SIFT.Hsapiens.dbSNP137)
## Return all columns. Note that the key, DBSNPID,
## is always returned.
select(SIFT.Hsapiens.dbSNP137, dbsnp[10])
## subset on keys and cols
cols <- c("VARIANT", "PROVEANPRED", "SIFTPRED")
select(SIFT.Hsapiens.dbSNP137, dbsnp[20:23], cols)
}
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(VariantAnnotation)
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: GenomeInfoDb
Loading required package: stats4
Loading required package: S4Vectors
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: Rsamtools
Loading required package: Biostrings
Loading required package: XVector
Attaching package: 'VariantAnnotation'
The following object is masked from 'package:base':
tabulate
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/VariantAnnotation/PROVEANDb-class.Rd_%03d_medium.png", width=480, height=480)
> ### Name: PROVEANDb-class
> ### Title: PROVEANDb objects
> ### Aliases: PROVEAN PROVEANDb class:PROVEANDb PROVEANDb-class
> ### columns,PROVEANDb-method keys,PROVEANDb-method
> ### keytypes,PROVEANDb-method select,PROVEANDb-method
> ### Keywords: classes methods
>
> ### ** Examples
>
> if (require(SIFT.Hsapiens.dbSNP137)) {
+ ## metadata
+ metadata(SIFT.Hsapiens.dbSNP137)
+
+ ## keys are the DBSNPID (NCBI dbSNP ID)
+ dbsnp <- keys(SIFT.Hsapiens.dbSNP137)
+ head(dbsnp)
+ columns(SIFT.Hsapiens.dbSNP137)
+
+ ## Return all columns. Note that the key, DBSNPID,
+ ## is always returned.
+ select(SIFT.Hsapiens.dbSNP137, dbsnp[10])
+ ## subset on keys and cols
+ cols <- c("VARIANT", "PROVEANPRED", "SIFTPRED")
+ select(SIFT.Hsapiens.dbSNP137, dbsnp[20:23], cols)
+ }
Loading required package: SIFT.Hsapiens.dbSNP137
Loading required package: RSQLite
Loading required package: DBI
DBSNPID VARIANT PROVEANPRED SIFTPRED
1 10004242 4,159782570,G,A Deleterious Damaging
2 10004516 4,111398208,A,G Neutral Tolerated
3 10005030 4,77317626,T,C Neutral Tolerated
4 10005739 4,16020122,T,G Deleterious Damaging
>
>
>
>
>
> dev.off()
null device
1
>