R: Assign functional prediction rfPred scores to human missense...
rfPred_scores
R Documentation
Assign functional prediction rfPred scores to human missense variants
Description
rfPred is a statistical method which combines 5
algorithms predictions in a random forest model: SIFT,
Polyphen2, LRT, PhyloP and MutationTaster. These scores
are available in the dbNFSP database for all the possible
missense variants in hg19 version, and the package rfPred
gives a composite score more reliable than each of the
isolated algorithms.
Arguments
variant_list
A variants list in a
data.frame containing 4 or 5 columns: chromosome
number, hg19 genomic position on the chromosome,
reference nucleotid, variant nucleotid and uniprot
protein identifier (optional); or a character string of
the path to a VCF (Variant Call Format) file; or a
GRanges object with metadata containing textually
reference, alteration and proteine
(optional) columns names for reference and alteration
data
Path to the compressed TabixFile, either on
the server (default) or on the user's computer
index
Path to the index of the TabixFile, either
on the server (default) or on the user's computer
all.col
TRUE to return all available
information, FALSE to return a more compact result
(the most informative columns, see Value)
file.export
Optional, name of the CSV file in
which export the results (default is NULL)
n.cores
number of cores to use when scaning the
TabixFile, can be efficient for large request (default is
1)
Value
The variants list with the assigned rfPred scores, as
well as the scores used to build rfPred meta-score: SIFT,
phyloP, MutationTaster, LRT (transformed) and Polyphen2
(corresponding to Polyphen2_HVAR_score). The data frame
returned contains these columns:
chromosome
chromosome number
position_hg19
physical position on the chromosome
as to hg19 (1-based coordinate)
reference
reference nucleotide allele (as on the +
strand)
alteration
alternative nucleotide allele
(as on the + strand)
proteine
Uniprot accession
number
aaref
reference amino acid
aaalt
alternative amino acid
aapos
amino
acid position as to the protein
rfPred_score
rfPred score betwen 0 and 1 (higher
it is, higher is the probability of pathogenicity)
SIFT_score
SIFT score between 0 and 1 (higher it
is, higher is the probability of pathogenicity contrary
to the original SIFT score) = 1-original SIFT score
Polyphen2_score
Polyphen2 (HVAR one) score between
0 and 1, used to calculate rfPred (higher it is, higher
is the probability of pathogenicity)
MutationTaster_score
MutationTaster score between
0 and 1 (higher it is, higher is the probability of
pathogenicity)
PhyloP_score
PhyloP score between
0 and 1 (higher it is, higher is the probability of
pathogenicity): PhyloP_score=1-0.5x10^phyloP if phyloP>0
or PhyloP_score=0.5x10^-phyloP if phyloP<0
LRT_score
LRT score between 0 and 1 (higher it is,
higher is the probability of pathogenicity):
LRT_score=1-LRToriginalx0.5 if LRT_Omega<1 or
LRT_score=LRToriginalx0.5 if LRT_Omega>=1
The following
columns are also returned if all.col is
TRUE:
Uniprot_id
Uniprot ID number
genename
gene name
position_hg18
physical
position on the chromosome as to hg18 (1-based
coordinate)
Polyphen2_HDIV_score
Polyphen2 score
based on HumDiv, i.e. hdiv_prob. The score ranges from 0
to 1: the corresponding prediction is "probably damaging"
if it is in [0.957,1]; "possibly damaging" if it is in
[0.453,0.956]; "benign" if it is in [0,0.452]. Score
cut-off for binary classification is 0.5, i.e. the
prediction is "neutral" if the score is lower than 0.5
and "deleterious" if the score is higher than 0.5.
Multiple entries separated by ";"
Polyphen2_HDIV_pred
Polyphen2 prediction based on
HumDiv: D (probably damaging), P (possibly
damaging) and B (benign). Multiple entries
separated by ";"
Polyphen2_HVAR_score
Polyphen2
score based on HumVar, i.e. hvar_prob. The score ranges
from 0 to 1, and the corresponding prediction is
"probably damaging" if it is in [0.909,1]; "possibly
damaging" if it is in [0.447,0.908]; "benign" if it is in
[0,0.446]. Score cut-off for binary classification is
0.5, i.e. the prediction is "neutral" if the score is
lower than 0.5 and "deleterious" if the score is higher
than 0.5. Multiple entries separated by ";"
Polyphen2_HVAR_pred
Polyphen2 prediction based on
HumVar: D (probably damaging), P (possibly
damaging) and B (benign). Multiple entries
separated by ";"
MutationTaster_pred
MutationTaster prediction:
A (disease_causing_automatic), D
(disease_causing), N (polymorphism) or P
(polymorphism_automatic)
phyloP
original phyloP
score
LRT_Omega
estimated
nonsynonymous-to-synonymous-rate ratio
LRT_pred
LRT prediction, D(eleterious),
N(eutral) or U(nknown)
Author(s)
Fabienne Jabot-Hanin, Hugo Varet and Jean-Philippe Jais
References
Jabot-Hanin F, Varet H, Tores F and Jais J-P. 2013.
rfPred: a new meta-score for functional prediction of
missense variants in human exome (submitted).
Examples
# from a data.frame without uniprot protein identifier
data(variant_list_Y)
res=rfPred_scores(variant_list = variant_list_Y[,1:4],
data = system.file("extdata", "chrY_rfPred.txtz", package="rfPred",mustWork=TRUE),
index = system.file("extdata", "chrY_rfPred.txtz.tbi", package="rfPred",mustWork=TRUE))
# from a data.frame with uniprot protein identifier
res2=rfPred_scores(variant_list = variant_list_Y,
data = system.file("extdata", "chrY_rfPred.txtz", package="rfPred",mustWork=TRUE),
index = system.file("extdata", "chrY_rfPred.txtz.tbi", package="rfPred",mustWork=TRUE))
# from a VCF file
res3=rfPred_scores(variant_list = system.file("extdata", "example.vcf", package="rfPred",mustWork=TRUE),
data = system.file("extdata", "chrY_rfPred.txtz", package="rfPred",mustWork=TRUE),
index = system.file("extdata", "chrY_rfPred.txtz.tbi", package="rfPred",mustWork=TRUE))
# from a GRanges object
data(example_GRanges)
res4=rfPred_scores(variant_list = example_GRanges,
data = system.file("extdata", "chrY_rfPred.txtz", package="rfPred",mustWork=TRUE),
index = system.file("extdata", "chrY_rfPred.txtz.tbi", package="rfPred",mustWork=TRUE))
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(rfPred)
Loading required package: Rsamtools
Loading required package: GenomeInfoDb
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: XVector
Loading required package: data.table
Attaching package: 'data.table'
The following object is masked from 'package:GenomicRanges':
shift
The following object is masked from 'package:IRanges':
shift
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/rfPred/rfPred_scores-methods.Rd_%03d_medium.png", width=480, height=480)
> ### Name: rfPred_scores
> ### Title: Assign functional prediction rfPred scores to human missense
> ### variants
> ### Aliases: rfPred_scores rfPred_scores,character-method
> ### rfPred_scores,data.frame-method rfPred_scores,GRanges-method
>
> ### ** Examples
>
> # from a data.frame without uniprot protein identifier
> data(variant_list_Y)
> res=rfPred_scores(variant_list = variant_list_Y[,1:4],
+ data = system.file("extdata", "chrY_rfPred.txtz", package="rfPred",mustWork=TRUE),
+ index = system.file("extdata", "chrY_rfPred.txtz.tbi", package="rfPred",mustWork=TRUE))
> # from a data.frame with uniprot protein identifier
> res2=rfPred_scores(variant_list = variant_list_Y,
+ data = system.file("extdata", "chrY_rfPred.txtz", package="rfPred",mustWork=TRUE),
+ index = system.file("extdata", "chrY_rfPred.txtz.tbi", package="rfPred",mustWork=TRUE))
> # from a VCF file
> res3=rfPred_scores(variant_list = system.file("extdata", "example.vcf", package="rfPred",mustWork=TRUE),
+ data = system.file("extdata", "chrY_rfPred.txtz", package="rfPred",mustWork=TRUE),
+ index = system.file("extdata", "chrY_rfPred.txtz.tbi", package="rfPred",mustWork=TRUE))
> # from a GRanges object
> data(example_GRanges)
> res4=rfPred_scores(variant_list = example_GRanges,
+ data = system.file("extdata", "chrY_rfPred.txtz", package="rfPred",mustWork=TRUE),
+ index = system.file("extdata", "chrY_rfPred.txtz.tbi", package="rfPred",mustWork=TRUE))
Warning message:
In as.data.frame(mcols(x), ...) : Arguments in '...' ignored
>
>
>
>
>
> dev.off()
null device
1
>