The SIFTDb class is a container for storing a connection to a SIFT
sqlite database.
Details
SIFT is a sequence homology-based tool that sorts intolerant from tolerant
amino acid substitutions and predicts whether an amino acid substitution
in a protein will have a phenotypic effect. SIFT is based on the premise
that protein evolution is correlated with protein function. Positions
important for function should be conserved in an alignment of the protein
family, whereas unimportant positions should appear diverse in an alignment.
SIFT uses multiple alignment information to predict tolerated
and deleterious substitutions for every position of the query sequence.
The procedure can be outlined in the following steps,
search for similar sequences
choose closely related sequences that may share similar
function to the query sequence
obtain the alignment of the chosen sequences
calculate normalized probabilities for all possible
substitutions from the alignment.
Positions with normalized probabilities less than 0.05 are predicted
to be deleterious, those greater than or equal to 0.05 are predicted to be
tolerated.
Methods
In the code below, x is a SIFTDb object.
metadata(x):
Returns x's metadata in a data frame.
columns(x):
Returns the names of the columns that can be used to subset the
data columns.
keys(x):
Returns the names of the keys that can be used to subset the
data rows. The keys values are the rsid's.
select(x, keys = NULL, columns = NULL, ...):
Returns a subset of data defined by the character vectors keys
and columns. If no keys are supplied, all rows are
returned. If no columns are supplied, all columns
are returned. For column descriptions see ?SIFTDbColumns.
Kumar P, Henikoff S, Ng PC. Predicting the effects of coding non-synonymous
variants on protein function using the SIFT algorithm. Nat Protoc.
2009;4(7):1073-81
Ng PC, Henikoff S. Predicting the Effects of Amino Acid Substitutions on
Protein Function Annu Rev Genomics Hum Genet. 2006;7:61-80.
Ng PC, Henikoff S. SIFT: predicting amino acid changes that affect protein
function. Nucleic Acids Res. 2003 Jul 1;31(13):3812-4.
Examples
if (interactive()) {
library(SIFT.Hsapiens.dbSNP132)
## metadata
metadata(SIFT.Hsapiens.dbSNP132)
## available rsid's
head(keys(SIFT.Hsapiens.dbSNP132))
## for column descriptions see ?SIFTDbColumns
columns(SIFT.Hsapiens.dbSNP132)
## subset on keys and columns
rsids <- c("rs2142947", "rs17970171", "rs8692231", "rs3026284")
subst <- c("RSID", "PREDICTION", "SCORE")
select(SIFT.Hsapiens.dbSNP132, keys=rsids, columns=subst)
select(SIFT.Hsapiens.dbSNP132, keys=rsids[1:2])
}
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(VariantAnnotation)
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: GenomeInfoDb
Loading required package: stats4
Loading required package: S4Vectors
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: Rsamtools
Loading required package: Biostrings
Loading required package: XVector
Attaching package: 'VariantAnnotation'
The following object is masked from 'package:base':
tabulate
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/VariantAnnotation/SIFTDb-class.Rd_%03d_medium.png", width=480, height=480)
> ### Name: SIFTDb-class
> ### Title: SIFTDb objects
> ### Aliases: SIFT SIFTDb class:SIFTDb SIFTDb-class metadata,SIFTDb-method
> ### columns,SIFTDb-method keys,SIFTDb-method select,SIFTDb-method
> ### Keywords: classes methods
>
> ### ** Examples
>
> #if (interactive()) {
> library(SIFT.Hsapiens.dbSNP132)
Loading required package: RSQLite
Loading required package: DBI
>
> ## metadata
> metadata(SIFT.Hsapiens.dbSNP132)
name value
1 Db type SIFTDb
2 Data source SIFT
3 Genome hg19
4 Genus and Species Homo sapiens
5 Resource URL http://sift.jcvi.org/
6 dbSNP build 132
7 Creation time 2012-03-13 16:15:13 -0700 (Tue, 13 Mar 2012)
8 RSQLite version at creation time 0.11.1
9 package VariantAnnotation
>
> ## available rsid's
> head(keys(SIFT.Hsapiens.dbSNP132))
[1] "rs47" "rs268" "rs298" "rs300" "rs332" "rs334"
>
> ## for column descriptions see ?SIFTDbColumns
> columns(SIFT.Hsapiens.dbSNP132)
[1] "AA" "AACHANGE" "MEDIAN" "METHOD" "POSTIONSEQS"
[6] "PREDICTION" "PROTEINID" "RSID" "SCORE" "TOTALSEQS"
>
> ## subset on keys and columns
> rsids <- c("rs2142947", "rs17970171", "rs8692231", "rs3026284")
> subst <- c("RSID", "PREDICTION", "SCORE")
> select(SIFT.Hsapiens.dbSNP132, keys=rsids, columns=subst)
RSID PREDICTION SCORE
1 rs2142947 TOLERATED 1.00
2 rs2142947 TOLERATED 0.74
3 rs2142947 TOLERATED 0.72
4 rs2142947 TOLERATED 1
5 rs17970171 NOT SCORED <NA>
6 rs8692231 NOT SCORED <NA>
7 rs3026284 DELETERIOUS 0.03
8 rs3026284 TOLERATED 1.00
9 rs3026284 DELETERIOUS 0.00
10 rs3026284 TOLERATED 1
> select(SIFT.Hsapiens.dbSNP132, keys=rsids[1:2])
RSID PROTEINID AACHANGE METHOD AA PREDICTION SCORE MEDIAN
1 rs2142947 NP_001019832 F430L BEST HITS L TOLERATED 1 2.4
2 rs2142947 NP_001019832 F430L BEST HITS F TOLERATED 0.74 2.4
3 rs2142947 NP_001019832 F430L ALL HITS L TOLERATED 0.72 3.34
4 rs2142947 NP_001019832 F430L ALL HITS F TOLERATED 1 3.34
5 rs17970171 NP_001045928 D18H BEST HITS H NOT SCORED <NA> <NA>
6 rs17970171 NP_001045928 D18H BEST HITS D NOT SCORED <NA> <NA>
7 rs17970171 NP_001045928 D18H ALL HITS H NOT SCORED <NA> <NA>
8 rs17970171 NP_001045928 D18H ALL HITS D NOT SCORED <NA> <NA>
POSITIONSEQS TOTALSEQS
1 22 27
2 22 27
3 96 99
4 96 99
5 <NA> <NA>
6 <NA> <NA>
7 <NA> <NA>
8 <NA> <NA>
> #}
>
>
>
>
>
> dev.off()
null device
1
>