R: Test two matched deep sequencing experiments for...
deepSNV
R Documentation
Test two matched deep sequencing experiments for low-frequency SNVs.
Description
This generic function can handle different types of inputs for the test and control experiments. It either reads from two .bam files,
uses two matrices of nucleotide counts, or re-evaluates the test results from a deepSNV-class object. The actual test is a
likelihood ratio test of a (beta-)binomial model for the individual nucleotide counts on each position under the hypothesis that both experiments share the same parameter,
and the alternative that the parameters differ. Because the difference in degrees of freedom is 1, the test statistic D = -2 log max{L_0}/max{L_1}
is asymptotically distributed as χ_1^2. The statistic may be tuned by a nucleotide specific Dirichlet prior that is learned across all genomic sites,
see estimateDirichlet. If the model is beta-binomial, a global dispersion parameter is used for all sites. It can be learned with estimateDispersion.
The test experiment. Either a .bam file, or a matrix with nucleotide counts, or a deepSNV-class object.
control
The control experiment. Must be of the same type as test, or missing if test is a deepSNV-class object.
alternative
The alternative to be tested. One of greater, less, or two.sided.
model
Which model to use. Either "bin", or "betabin". Default "bin".
dirichlet.prior
A base-sepecific Dirichlet prior specified as a matrix. Default NULL.
pseudo.count
If dirichlet.prior=NULL, a pseudocount can be used to define a flat prior.
over.dispersion
A numeric factor for the over.dispersion, if the model is beta-binomial. Default 100.
combine.method
The method to combine p-values. One of "fisher" (default), "max", or "average". See p.combine for details.
regions
The regions to be parsed if test and control are .bam files. Either a data.frame with columns "chr" (chromosome),
"start", "stop", or a GRanges object. If multiple regions are specified, the appropriate slots of the returned object are concatenated by row.
q
The quality arguement passed to bam2R if the experiments are .bam files.
s
The strand argument passed to bam2R if the experiments are .bam files.
head.clip
The head.clip argument passed to bam2R if the experiments are .bam files.
...
Additional arguments.
Value
A deepSNV object
Author(s)
Moritz Gerstung
Examples
## Short example with 2 SNVs at frequency ~10%
regions <- data.frame(chr="B.FR.83.HXB2_LAI_IIIB_BRU_K034", start = 3120, stop=3140)
ex <- deepSNV(test = system.file("extdata", "test.bam", package="deepSNV"), control = system.file("extdata", "control.bam", package="deepSNV"), regions=regions, q=10)
show(ex) # show method
plot(ex) # scatter plot
summary(ex) # summary with significant SNVs
ex[1:3,] # subsetting the first three genomic positions
tail(test(ex, total=TRUE)) # retrieve the test counts on both strands
tail(control(ex, total=TRUE))
## Not run: Full example with ~ 100 SNVs. Requires an internet connection, but try yourself.
# regions <- data.frame(chr="B.FR.83.HXB2_LAI_IIIB_BRU_K034", start = 2074, stop=3585)
# HIVmix <- deepSNV(test = "http://www.bsse.ethz.ch/cbg/software/deepSNV/data/test.bam", control = "http://www.bsse.ethz.ch/cbg/software/deepSNV/data/control.bam", regions=regions, q=10)
data(HIVmix) # attach data instead..
show(HIVmix)
plot(HIVmix)
head(summary(HIVmix))
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(deepSNV)
Loading required package: parallel
Loading required package: Rhtslib
Rhtslib htslib version 1.1
Loading required package: IRanges
Loading required package: BiocGenerics
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: Biostrings
Loading required package: XVector
Loading required package: VGAM
Loading required package: splines
Loading required package: VariantAnnotation
Loading required package: Rsamtools
Attaching package: 'VariantAnnotation'
The following object is masked from 'package:base':
tabulate
Attaching package: 'deepSNV'
The following objects are masked from 'package:VGAM':
dbetabinom, pbetabinom
The following object is masked from 'package:BiocGenerics':
normalize
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/deepSNV/deepSNV-methods.Rd_%03d_medium.png", width=480, height=480)
> ### Name: deepSNV
> ### Title: Test two matched deep sequencing experiments for low-frequency
> ### SNVs.
> ### Aliases: deepSNV deepSNV,character,character-method
> ### deepSNV,character,matrix-method deepSNV,deepSNV,missing-method
> ### deepSNV,matrix,character-method deepSNV,matrix,matrix-method
>
> ### ** Examples
>
> ## Short example with 2 SNVs at frequency ~10%
> regions <- data.frame(chr="B.FR.83.HXB2_LAI_IIIB_BRU_K034", start = 3120, stop=3140)
> ex <- deepSNV(test = system.file("extdata", "test.bam", package="deepSNV"), control = system.file("extdata", "control.bam", package="deepSNV"), regions=regions, q=10)
> show(ex) # show method
Data: 21 positions x 10 characters
Model: bin
Alternative: greater
Combine Method: fisher
P-Values:
A T C G -
[1,] 0.5965736 NA 0.59657359 0.5965736 0.5965736
[2,] 0.4378589 8.465736e-01 NA 0.5965736 0.5965736
[3,] 0.5965736 NA 0.07559581 0.8465736 0.5965736
[4,] 0.3962578 5.965736e-01 0.59657359 NA 0.5965736
[5,] NA 4.369021e-01 0.59657359 0.8465736 0.5965736
[6,] 0.8177195 6.404014e-39 NA 0.5965736 0.5965736
...
A T C G -
[16,] 0.47970559 0.5965736 0.84657359 NA 0.5965736
[17,] 0.05350392 0.5965736 0.84657359 NA 0.5965736
[18,] 1.00000000 0.8465736 1.00000000 NA 0.5965736
[19,] 0.24915660 0.8465736 NA 1.011605e-01 0.5965736
[20,] NA 0.4422493 0.07253975 4.351952e-02 0.5965736
[21,] NA 0.8465736 0.84657359 1.747374e-45 0.5965736
> plot(ex) # scatter plot
> summary(ex) # summary with significant SNVs
chr pos ref var p.val freq.var
1 B.FR.83.HXB2_LAI_IIIB_BRU_K034 3125 C T 5.379372e-37 0.03828036
2 B.FR.83.HXB2_LAI_IIIB_BRU_K034 3140 A G 1.467794e-43 0.04875622
sigma2.freq.var n.tst.fw cov.tst.fw n.tst.bw cov.tst.bw n.ctrl.fw cov.ctrl.fw
1 1.678502e-05 58 1461 32 862 1 3066
2 2.425683e-05 60 1346 38 664 0 2775
n.ctrl.bw cov.ctrl.bw raw.p.val
1 1 1257 6.404014e-39
2 0 986 1.747374e-45
> ex[1:3,] # subsetting the first three genomic positions
Data: 3 positions x 10 characters
Model: bin
Alternative: greater
Combine Method: fisher
P-Values:
A T C G -
[1,] 0.5965736 NA 0.59657359 0.5965736 0.5965736
[2,] 0.4378589 0.8465736 NA 0.5965736 0.5965736
[3,] 0.5965736 NA 0.07559581 0.8465736 0.5965736
> tail(test(ex, total=TRUE)) # retrieve the test counts on both strands
A T C G -
[16,] 3 0 0 2172 0
[17,] 4 0 0 2140 0
[18,] 0 0 1 2116 6
[19,] 6 0 2073 10 0
[20,] 2072 1 8 3 0
[21,] 1911 0 1 98 0
> tail(control(ex, total=TRUE))
A T C G -
[16,] 2 0 1 4043 0
[17,] 0 0 1 3998 0
[18,] 3 1 4 3945 8
[19,] 5 1 3908 8 0
[20,] 3897 0 5 0 0
[21,] 3757 1 3 0 0
>
> ## Not run: Full example with ~ 100 SNVs. Requires an internet connection, but try yourself.
> # regions <- data.frame(chr="B.FR.83.HXB2_LAI_IIIB_BRU_K034", start = 2074, stop=3585)
> # HIVmix <- deepSNV(test = "http://www.bsse.ethz.ch/cbg/software/deepSNV/data/test.bam", control = "http://www.bsse.ethz.ch/cbg/software/deepSNV/data/control.bam", regions=regions, q=10)
> data(HIVmix) # attach data instead..
> show(HIVmix)
Data: 1512 positions x 10 characters
Model: bin
Alternative: greater
Combine Method: fisher
P-Values:
A T C G -
[1,] NA 0.5965736 0.5965736 0.5965736 0.5965736
[2,] 0.5965736 0.5965736 0.5965736 NA 0.5965736
[3,] NA 0.5965736 0.5965736 0.5965736 0.5965736
[4,] 0.5965736 0.5965736 NA 0.5965736 0.5965736
[5,] NA 0.5965736 0.5965736 0.5965736 0.5965736
[6,] 0.5965736 0.5965736 0.5965736 NA 0.5965736
...
A T C G -
[1507,] NA 0.5965736 0.5965736 0.5965736 0.5965736
[1508,] 0.8465736 0.5965736 0.8465736 NA 0.5965736
[1509,] 0.5965736 0.5965736 NA 0.5965736 0.5965736
[1510,] 0.8465736 0.5965736 NA 0.5965736 0.5965736
[1511,] NA 0.5965736 0.5965736 0.4737885 0.5965736
[1512,] 1.0000000 NA 0.8465736 0.8465736 0.8465736
> plot(HIVmix)
> head(summary(HIVmix))
chr pos ref var p.val freq.var
1 B.FR.83.HXB2_LAI_IIIB_BRU_K034 2127 G A 3.814024e-05 0.02903526
51 B.FR.83.HXB2_LAI_IIIB_BRU_K034 2130 T C 1.423636e-07 0.03076923
70 B.FR.83.HXB2_LAI_IIIB_BRU_K034 2139 A G 5.815022e-09 0.03362573
71 B.FR.83.HXB2_LAI_IIIB_BRU_K034 2141 A G 4.271206e-09 0.03333333
52 B.FR.83.HXB2_LAI_IIIB_BRU_K034 2150 A C 9.543121e-04 0.01763908
2 B.FR.83.HXB2_LAI_IIIB_BRU_K034 2151 G A 4.527563e-08 0.02815013
sigma2.freq.var n.tst.fw cov.tst.fw n.tst.bw cov.tst.bw n.ctrl.fw
1 4.892000e-05 17 581 2 47 2
51 4.733728e-05 16 597 4 53 0
70 4.916043e-05 16 599 7 85 0
71 4.830918e-05 16 599 7 91 0
52 2.393362e-05 10 609 3 128 0
2 3.773476e-05 16 614 5 132 0
cov.ctrl.fw n.ctrl.bw cov.ctrl.bw raw.p.val
1 1537 0 103 6.306257e-09
51 1534 0 104 2.353895e-11
70 1535 0 158 9.614784e-13
71 1535 0 181 7.062179e-13
52 1538 0 283 1.577897e-07
2 1539 0 287 7.486050e-12
>
>
>
>
>
>
> dev.off()
null device
1
>