Last data update: 2014.03.03

R: SIFTDb objects
SIFTDb-classR Documentation

SIFTDb objects

Description

The SIFTDb class is a container for storing a connection to a SIFT sqlite database.

Details

SIFT is a sequence homology-based tool that sorts intolerant from tolerant amino acid substitutions and predicts whether an amino acid substitution in a protein will have a phenotypic effect. SIFT is based on the premise that protein evolution is correlated with protein function. Positions important for function should be conserved in an alignment of the protein family, whereas unimportant positions should appear diverse in an alignment.

SIFT uses multiple alignment information to predict tolerated and deleterious substitutions for every position of the query sequence. The procedure can be outlined in the following steps,

  • search for similar sequences

  • choose closely related sequences that may share similar function to the query sequence

  • obtain the alignment of the chosen sequences

  • calculate normalized probabilities for all possible substitutions from the alignment.

Positions with normalized probabilities less than 0.05 are predicted to be deleterious, those greater than or equal to 0.05 are predicted to be tolerated.

Methods

In the code below, x is a SIFTDb object.

metadata(x): Returns x's metadata in a data frame.

columns(x): Returns the names of the columns that can be used to subset the data columns.

keys(x): Returns the names of the keys that can be used to subset the data rows. The keys values are the rsid's.

select(x, keys = NULL, columns = NULL, ...): Returns a subset of data defined by the character vectors keys and columns. If no keys are supplied, all rows are returned. If no columns are supplied, all columns are returned. For column descriptions see ?SIFTDbColumns.

Author(s)

Valerie Obenchain

References

SIFT Home: http://sift.jcvi.org/

Kumar P, Henikoff S, Ng PC. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat Protoc. 2009;4(7):1073-81

Ng PC, Henikoff S. Predicting the Effects of Amino Acid Substitutions on Protein Function Annu Rev Genomics Hum Genet. 2006;7:61-80.

Ng PC, Henikoff S. SIFT: predicting amino acid changes that affect protein function. Nucleic Acids Res. 2003 Jul 1;31(13):3812-4.

Examples

if (interactive()) {
    library(SIFT.Hsapiens.dbSNP132)
    
    ## metadata
    metadata(SIFT.Hsapiens.dbSNP132)
    
    ## available rsid's 
    head(keys(SIFT.Hsapiens.dbSNP132))
    
    ## for column descriptions see ?SIFTDbColumns
    columns(SIFT.Hsapiens.dbSNP132)
    
    ## subset on keys and columns 
    rsids <- c("rs2142947", "rs17970171", "rs8692231", "rs3026284") 
    subst <- c("RSID", "PREDICTION", "SCORE")
    select(SIFT.Hsapiens.dbSNP132, keys=rsids, columns=subst)
    select(SIFT.Hsapiens.dbSNP132, keys=rsids[1:2])
}

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(VariantAnnotation)
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: GenomeInfoDb
Loading required package: stats4
Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: Rsamtools
Loading required package: Biostrings
Loading required package: XVector

Attaching package: 'VariantAnnotation'

The following object is masked from 'package:base':

    tabulate

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/VariantAnnotation/SIFTDb-class.Rd_%03d_medium.png", width=480, height=480)
> ### Name: SIFTDb-class
> ### Title: SIFTDb objects
> ### Aliases: SIFT SIFTDb class:SIFTDb SIFTDb-class metadata,SIFTDb-method
> ###   columns,SIFTDb-method keys,SIFTDb-method select,SIFTDb-method
> ### Keywords: classes methods
> 
> ### ** Examples
> 
> #if (interactive()) {
>     library(SIFT.Hsapiens.dbSNP132)
Loading required package: RSQLite
Loading required package: DBI
>     
>     ## metadata
>     metadata(SIFT.Hsapiens.dbSNP132)
                              name                                        value
1                          Db type                                       SIFTDb
2                      Data source                                         SIFT
3                           Genome                                         hg19
4                Genus and Species                                 Homo sapiens
5                     Resource URL                        http://sift.jcvi.org/
6                      dbSNP build                                          132
7                    Creation time 2012-03-13 16:15:13 -0700 (Tue, 13 Mar 2012)
8 RSQLite version at creation time                                       0.11.1
9                          package                            VariantAnnotation
>     
>     ## available rsid's 
>     head(keys(SIFT.Hsapiens.dbSNP132))
[1] "rs47"  "rs268" "rs298" "rs300" "rs332" "rs334"
>     
>     ## for column descriptions see ?SIFTDbColumns
>     columns(SIFT.Hsapiens.dbSNP132)
 [1] "AA"          "AACHANGE"    "MEDIAN"      "METHOD"      "POSTIONSEQS"
 [6] "PREDICTION"  "PROTEINID"   "RSID"        "SCORE"       "TOTALSEQS"  
>     
>     ## subset on keys and columns 
>     rsids <- c("rs2142947", "rs17970171", "rs8692231", "rs3026284") 
>     subst <- c("RSID", "PREDICTION", "SCORE")
>     select(SIFT.Hsapiens.dbSNP132, keys=rsids, columns=subst)
         RSID  PREDICTION SCORE
1   rs2142947   TOLERATED  1.00
2   rs2142947   TOLERATED  0.74
3   rs2142947   TOLERATED  0.72
4   rs2142947   TOLERATED     1
5  rs17970171  NOT SCORED  <NA>
6   rs8692231  NOT SCORED  <NA>
7   rs3026284 DELETERIOUS  0.03
8   rs3026284   TOLERATED  1.00
9   rs3026284 DELETERIOUS  0.00
10  rs3026284   TOLERATED     1
>     select(SIFT.Hsapiens.dbSNP132, keys=rsids[1:2])
        RSID    PROTEINID AACHANGE    METHOD AA PREDICTION SCORE MEDIAN
1  rs2142947 NP_001019832    F430L BEST HITS  L  TOLERATED     1    2.4
2  rs2142947 NP_001019832    F430L BEST HITS  F  TOLERATED  0.74    2.4
3  rs2142947 NP_001019832    F430L  ALL HITS  L  TOLERATED  0.72   3.34
4  rs2142947 NP_001019832    F430L  ALL HITS  F  TOLERATED     1   3.34
5 rs17970171 NP_001045928     D18H BEST HITS  H NOT SCORED  <NA>   <NA>
6 rs17970171 NP_001045928     D18H BEST HITS  D NOT SCORED  <NA>   <NA>
7 rs17970171 NP_001045928     D18H  ALL HITS  H NOT SCORED  <NA>   <NA>
8 rs17970171 NP_001045928     D18H  ALL HITS  D NOT SCORED  <NA>   <NA>
  POSITIONSEQS TOTALSEQS
1           22        27
2           22        27
3           96        99
4           96        99
5         <NA>      <NA>
6         <NA>      <NA>
7         <NA>      <NA>
8         <NA>      <NA>
> #}
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>