R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Assign functional prediction rfPred scores to human missense...

rfPred_scores

R Documentation

Assign functional prediction rfPred scores to human missense variants

Description

rfPred is a statistical method which combines 5 algorithms predictions in a random forest model: SIFT, Polyphen2, LRT, PhyloP and MutationTaster. These scores are available in the dbNFSP database for all the possible missense variants in hg19 version, and the package rfPred gives a composite score more reliable than each of the isolated algorithms.

Arguments

`variant_list`	A variants list in a `data.frame` containing 4 or 5 columns: chromosome number, hg19 genomic position on the chromosome, reference nucleotid, variant nucleotid and uniprot protein identifier (optional); or a character string of the path to a VCF (Variant Call Format) file; or a `GRanges` object with metadata containing textually `reference`, `alteration` and `proteine` (optional) columns names for reference and alteration
`data`	Path to the compressed TabixFile, either on the server (default) or on the user's computer
`index`	Path to the index of the TabixFile, either on the server (default) or on the user's computer
`all.col`	`TRUE` to return all available information, `FALSE` to return a more compact result (the most informative columns, see Value)
`file.export`	Optional, name of the CSV file in which export the results (default is `NULL`)
`n.cores`	number of cores to use when scaning the TabixFile, can be efficient for large request (default is 1)

Value

The variants list with the assigned rfPred scores, as well as the scores used to build rfPred meta-score: SIFT, phyloP, MutationTaster, LRT (transformed) and Polyphen2 (corresponding to Polyphen2_HVAR_score). The data frame returned contains these columns:

`chromosome`	chromosome number
`position_hg19`	physical position on the chromosome as to hg19 (1-based coordinate)
`reference`	reference nucleotide allele (as on the + strand)
`alteration`	alternative nucleotide allele (as on the + strand)
`proteine`	Uniprot accession number
`aaref`	reference amino acid
`aaalt`	alternative amino acid
`aapos`	amino acid position as to the protein
`rfPred_score`	rfPred score betwen 0 and 1 (higher it is, higher is the probability of pathogenicity)
`SIFT_score`	SIFT score between 0 and 1 (higher it is, higher is the probability of pathogenicity contrary to the original SIFT score) = 1-original SIFT score
`Polyphen2_score`	Polyphen2 (HVAR one) score between 0 and 1, used to calculate rfPred (higher it is, higher is the probability of pathogenicity)
`MutationTaster_score`	MutationTaster score between 0 and 1 (higher it is, higher is the probability of pathogenicity)
`PhyloP_score`	PhyloP score between 0 and 1 (higher it is, higher is the probability of pathogenicity): PhyloP_score=1-0.5x10^phyloP if phyloP>0 or PhyloP_score=0.5x10^-phyloP if phyloP<0
`LRT_score`	LRT score between 0 and 1 (higher it is, higher is the probability of pathogenicity): LRT_score=1-LRToriginalx0.5 if LRT_Omega<1 or LRT_score=LRToriginalx0.5 if LRT_Omega>=1

The following columns are also returned if all.col is TRUE:

`Uniprot_id`	Uniprot ID number
`genename`	gene name
`position_hg18`	physical position on the chromosome as to hg18 (1-based coordinate)
`Polyphen2_HDIV_score`	Polyphen2 score based on HumDiv, i.e. hdiv_prob. The score ranges from 0 to 1: the corresponding prediction is "probably damaging" if it is in [0.957,1]; "possibly damaging" if it is in [0.453,0.956]; "benign" if it is in [0,0.452]. Score cut-off for binary classification is 0.5, i.e. the prediction is "neutral" if the score is lower than 0.5 and "deleterious" if the score is higher than 0.5. Multiple entries separated by ";"
`Polyphen2_HDIV_pred`	Polyphen2 prediction based on HumDiv: `D` (probably damaging), `P` (possibly damaging) and `B` (benign). Multiple entries separated by ";"
`Polyphen2_HVAR_score`	Polyphen2 score based on HumVar, i.e. hvar_prob. The score ranges from 0 to 1, and the corresponding prediction is "probably damaging" if it is in [0.909,1]; "possibly damaging" if it is in [0.447,0.908]; "benign" if it is in [0,0.446]. Score cut-off for binary classification is 0.5, i.e. the prediction is "neutral" if the score is lower than 0.5 and "deleterious" if the score is higher than 0.5. Multiple entries separated by ";"
`Polyphen2_HVAR_pred`	Polyphen2 prediction based on HumVar: `D` (probably damaging), `P` (possibly damaging) and `B` (benign). Multiple entries separated by ";"
`MutationTaster_pred`	MutationTaster prediction: `A` (disease_causing_automatic), `D` (disease_causing), `N` (polymorphism) or `P` (polymorphism_automatic)
`phyloP`	original phyloP score
`LRT_Omega`	estimated nonsynonymous-to-synonymous-rate ratio
`LRT_pred`	LRT prediction, `D`(eleterious), `N`(eutral) or `U`(nknown)

Author(s)

Fabienne Jabot-Hanin, Hugo Varet and Jean-Philippe Jais

References

Jabot-Hanin F, Varet H, Tores F and Jais J-P. 2013. rfPred: a new meta-score for functional prediction of missense variants in human exome (submitted).

Examples

# from a data.frame without uniprot protein identifier
data(variant_list_Y)
res=rfPred_scores(variant_list = variant_list_Y[,1:4],
        data = system.file("extdata", "chrY_rfPred.txtz", package="rfPred",mustWork=TRUE),
        index = system.file("extdata", "chrY_rfPred.txtz.tbi", package="rfPred",mustWork=TRUE))
# from a data.frame with uniprot protein identifier
res2=rfPred_scores(variant_list = variant_list_Y,
        data = system.file("extdata", "chrY_rfPred.txtz", package="rfPred",mustWork=TRUE),
        index = system.file("extdata", "chrY_rfPred.txtz.tbi", package="rfPred",mustWork=TRUE))
# from a VCF file
res3=rfPred_scores(variant_list = system.file("extdata", "example.vcf", package="rfPred",mustWork=TRUE),
        data = system.file("extdata", "chrY_rfPred.txtz", package="rfPred",mustWork=TRUE),
        index = system.file("extdata", "chrY_rfPred.txtz.tbi", package="rfPred",mustWork=TRUE))
# from a GRanges object
data(example_GRanges)
res4=rfPred_scores(variant_list = example_GRanges,
        data = system.file("extdata", "chrY_rfPred.txtz", package="rfPred",mustWork=TRUE),
        index = system.file("extdata", "chrY_rfPred.txtz.tbi", package="rfPred",mustWork=TRUE))

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(rfPred)
Loading required package: Rsamtools
Loading required package: GenomeInfoDb
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: XVector
Loading required package: data.table

Attaching package: 'data.table'

The following object is masked from 'package:GenomicRanges':

    shift

The following object is masked from 'package:IRanges':

    shift

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/rfPred/rfPred_scores-methods.Rd_%03d_medium.png", width=480, height=480)
> ### Name: rfPred_scores
> ### Title: Assign functional prediction rfPred scores to human missense
> ###   variants
> ### Aliases: rfPred_scores rfPred_scores,character-method
> ###   rfPred_scores,data.frame-method rfPred_scores,GRanges-method
> 
> ### ** Examples
> 
> # from a data.frame without uniprot protein identifier
> data(variant_list_Y)
> res=rfPred_scores(variant_list = variant_list_Y[,1:4],
+         data = system.file("extdata", "chrY_rfPred.txtz", package="rfPred",mustWork=TRUE),
+         index = system.file("extdata", "chrY_rfPred.txtz.tbi", package="rfPred",mustWork=TRUE))
> # from a data.frame with uniprot protein identifier
> res2=rfPred_scores(variant_list = variant_list_Y,
+         data = system.file("extdata", "chrY_rfPred.txtz", package="rfPred",mustWork=TRUE),
+         index = system.file("extdata", "chrY_rfPred.txtz.tbi", package="rfPred",mustWork=TRUE))
> # from a VCF file
> res3=rfPred_scores(variant_list = system.file("extdata", "example.vcf", package="rfPred",mustWork=TRUE),
+         data = system.file("extdata", "chrY_rfPred.txtz", package="rfPred",mustWork=TRUE),
+         index = system.file("extdata", "chrY_rfPred.txtz.tbi", package="rfPred",mustWork=TRUE))
> # from a GRanges object
> data(example_GRanges)
> res4=rfPred_scores(variant_list = example_GRanges,
+         data = system.file("extdata", "chrY_rfPred.txtz", package="rfPred",mustWork=TRUE),
+         index = system.file("extdata", "chrY_rfPred.txtz.tbi", package="rfPred",mustWork=TRUE))
Warning message:
In as.data.frame(mcols(x), ...) : Arguments in '...' ignored
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>