Last data update: 2014.03.03

R: Test two matched deep sequencing experiments for...
deepSNVR Documentation

Test two matched deep sequencing experiments for low-frequency SNVs.

Description

This generic function can handle different types of inputs for the test and control experiments. It either reads from two .bam files, uses two matrices of nucleotide counts, or re-evaluates the test results from a deepSNV-class object. The actual test is a likelihood ratio test of a (beta-)binomial model for the individual nucleotide counts on each position under the hypothesis that both experiments share the same parameter, and the alternative that the parameters differ. Because the difference in degrees of freedom is 1, the test statistic D = -2 log max{L_0}/max{L_1} is asymptotically distributed as χ_1^2. The statistic may be tuned by a nucleotide specific Dirichlet prior that is learned across all genomic sites, see estimateDirichlet. If the model is beta-binomial, a global dispersion parameter is used for all sites. It can be learned with estimateDispersion.

Usage

deepSNV(test, control, ...)

## S4 method for signature 'matrix,matrix'
deepSNV(test,control, alternative = c('greater', 'less', 'two.sided'), dirichlet.prior = NULL, pseudo.count=1, combine.method = c("fisher", "max", "average"), over.dispersion = 100, model = c("bin", "betabin"), ...)

## S4 method for signature 'deepSNV,missing'
deepSNV(test, control, ...)

## S4 method for signature 'character,character'
deepSNV(test, control, regions, q=25, s=2, head.clip=0, ...)

## S4 method for signature 'matrix,character'
deepSNV(test, control, regions, q=25, s=2, ...)

## S4 method for signature 'character,matrix'
deepSNV(test, control, regions, q=25, s=2, ...)

Arguments

test

The test experiment. Either a .bam file, or a matrix with nucleotide counts, or a deepSNV-class object.

control

The control experiment. Must be of the same type as test, or missing if test is a deepSNV-class object.

alternative

The alternative to be tested. One of greater, less, or two.sided.

model

Which model to use. Either "bin", or "betabin". Default "bin".

dirichlet.prior

A base-sepecific Dirichlet prior specified as a matrix. Default NULL.

pseudo.count

If dirichlet.prior=NULL, a pseudocount can be used to define a flat prior.

over.dispersion

A numeric factor for the over.dispersion, if the model is beta-binomial. Default 100.

combine.method

The method to combine p-values. One of "fisher" (default), "max", or "average". See p.combine for details.

regions

The regions to be parsed if test and control are .bam files. Either a data.frame with columns "chr" (chromosome), "start", "stop", or a GRanges object. If multiple regions are specified, the appropriate slots of the returned object are concatenated by row.

q

The quality arguement passed to bam2R if the experiments are .bam files.

s

The strand argument passed to bam2R if the experiments are .bam files.

head.clip

The head.clip argument passed to bam2R if the experiments are .bam files.

...

Additional arguments.

Value

A deepSNV object

Author(s)

Moritz Gerstung

Examples

## Short example with 2 SNVs at frequency ~10%
regions <- data.frame(chr="B.FR.83.HXB2_LAI_IIIB_BRU_K034", start = 3120, stop=3140)
ex <- deepSNV(test = system.file("extdata", "test.bam", package="deepSNV"), control = system.file("extdata", "control.bam", package="deepSNV"), regions=regions, q=10)
show(ex)   # show method
plot(ex)   # scatter plot
summary(ex)   # summary with significant SNVs
ex[1:3,]   # subsetting the first three genomic positions
tail(test(ex, total=TRUE))   # retrieve the test counts on both strands
tail(control(ex, total=TRUE))

## Not run: Full example with ~ 100 SNVs. Requires an internet connection, but try yourself.
# regions <- data.frame(chr="B.FR.83.HXB2_LAI_IIIB_BRU_K034", start = 2074, stop=3585)
# HIVmix <- deepSNV(test = "http://www.bsse.ethz.ch/cbg/software/deepSNV/data/test.bam", control = "http://www.bsse.ethz.ch/cbg/software/deepSNV/data/control.bam", regions=regions, q=10)
data(HIVmix) # attach data instead..
show(HIVmix)
plot(HIVmix)
head(summary(HIVmix))

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(deepSNV)
Loading required package: parallel
Loading required package: Rhtslib
Rhtslib htslib version 1.1
Loading required package: IRanges
Loading required package: BiocGenerics

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: Biostrings
Loading required package: XVector
Loading required package: VGAM
Loading required package: splines
Loading required package: VariantAnnotation
Loading required package: Rsamtools

Attaching package: 'VariantAnnotation'

The following object is masked from 'package:base':

    tabulate


Attaching package: 'deepSNV'

The following objects are masked from 'package:VGAM':

    dbetabinom, pbetabinom

The following object is masked from 'package:BiocGenerics':

    normalize

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/deepSNV/deepSNV-methods.Rd_%03d_medium.png", width=480, height=480)
> ### Name: deepSNV
> ### Title: Test two matched deep sequencing experiments for low-frequency
> ###   SNVs.
> ### Aliases: deepSNV deepSNV,character,character-method
> ###   deepSNV,character,matrix-method deepSNV,deepSNV,missing-method
> ###   deepSNV,matrix,character-method deepSNV,matrix,matrix-method
> 
> ### ** Examples
> 
> ## Short example with 2 SNVs at frequency ~10%
> regions <- data.frame(chr="B.FR.83.HXB2_LAI_IIIB_BRU_K034", start = 3120, stop=3140)
> ex <- deepSNV(test = system.file("extdata", "test.bam", package="deepSNV"), control = system.file("extdata", "control.bam", package="deepSNV"), regions=regions, q=10)
> show(ex)   # show method
Data:  21 positions x  10 characters
Model:  bin 
Alternative:  greater 
Combine Method:  fisher 
P-Values:
             A            T          C         G         -
[1,] 0.5965736           NA 0.59657359 0.5965736 0.5965736
[2,] 0.4378589 8.465736e-01         NA 0.5965736 0.5965736
[3,] 0.5965736           NA 0.07559581 0.8465736 0.5965736
[4,] 0.3962578 5.965736e-01 0.59657359        NA 0.5965736
[5,]        NA 4.369021e-01 0.59657359 0.8465736 0.5965736
[6,] 0.8177195 6.404014e-39         NA 0.5965736 0.5965736
...
               A         T          C            G         -
[16,] 0.47970559 0.5965736 0.84657359           NA 0.5965736
[17,] 0.05350392 0.5965736 0.84657359           NA 0.5965736
[18,] 1.00000000 0.8465736 1.00000000           NA 0.5965736
[19,] 0.24915660 0.8465736         NA 1.011605e-01 0.5965736
[20,]         NA 0.4422493 0.07253975 4.351952e-02 0.5965736
[21,]         NA 0.8465736 0.84657359 1.747374e-45 0.5965736
> plot(ex)   # scatter plot
> summary(ex)   # summary with significant SNVs
                             chr  pos ref var        p.val   freq.var
1 B.FR.83.HXB2_LAI_IIIB_BRU_K034 3125   C   T 5.379372e-37 0.03828036
2 B.FR.83.HXB2_LAI_IIIB_BRU_K034 3140   A   G 1.467794e-43 0.04875622
  sigma2.freq.var n.tst.fw cov.tst.fw n.tst.bw cov.tst.bw n.ctrl.fw cov.ctrl.fw
1    1.678502e-05       58       1461       32        862         1        3066
2    2.425683e-05       60       1346       38        664         0        2775
  n.ctrl.bw cov.ctrl.bw    raw.p.val
1         1        1257 6.404014e-39
2         0         986 1.747374e-45
> ex[1:3,]   # subsetting the first three genomic positions
Data:  3 positions x  10 characters
Model:  bin 
Alternative:  greater 
Combine Method:  fisher 
P-Values:
             A         T          C         G         -
[1,] 0.5965736        NA 0.59657359 0.5965736 0.5965736
[2,] 0.4378589 0.8465736         NA 0.5965736 0.5965736
[3,] 0.5965736        NA 0.07559581 0.8465736 0.5965736
> tail(test(ex, total=TRUE))   # retrieve the test counts on both strands
         A T    C    G -
[16,]    3 0    0 2172 0
[17,]    4 0    0 2140 0
[18,]    0 0    1 2116 6
[19,]    6 0 2073   10 0
[20,] 2072 1    8    3 0
[21,] 1911 0    1   98 0
> tail(control(ex, total=TRUE))
         A T    C    G -
[16,]    2 0    1 4043 0
[17,]    0 0    1 3998 0
[18,]    3 1    4 3945 8
[19,]    5 1 3908    8 0
[20,] 3897 0    5    0 0
[21,] 3757 1    3    0 0
> 
> ## Not run: Full example with ~ 100 SNVs. Requires an internet connection, but try yourself.
> # regions <- data.frame(chr="B.FR.83.HXB2_LAI_IIIB_BRU_K034", start = 2074, stop=3585)
> # HIVmix <- deepSNV(test = "http://www.bsse.ethz.ch/cbg/software/deepSNV/data/test.bam", control = "http://www.bsse.ethz.ch/cbg/software/deepSNV/data/control.bam", regions=regions, q=10)
> data(HIVmix) # attach data instead..
> show(HIVmix)
Data:  1512 positions x  10 characters
Model:  bin 
Alternative:  greater 
Combine Method:  fisher 
P-Values:
             A         T         C         G         -
[1,]        NA 0.5965736 0.5965736 0.5965736 0.5965736
[2,] 0.5965736 0.5965736 0.5965736        NA 0.5965736
[3,]        NA 0.5965736 0.5965736 0.5965736 0.5965736
[4,] 0.5965736 0.5965736        NA 0.5965736 0.5965736
[5,]        NA 0.5965736 0.5965736 0.5965736 0.5965736
[6,] 0.5965736 0.5965736 0.5965736        NA 0.5965736
...
                A         T         C         G         -
[1507,]        NA 0.5965736 0.5965736 0.5965736 0.5965736
[1508,] 0.8465736 0.5965736 0.8465736        NA 0.5965736
[1509,] 0.5965736 0.5965736        NA 0.5965736 0.5965736
[1510,] 0.8465736 0.5965736        NA 0.5965736 0.5965736
[1511,]        NA 0.5965736 0.5965736 0.4737885 0.5965736
[1512,] 1.0000000        NA 0.8465736 0.8465736 0.8465736
> plot(HIVmix)
> head(summary(HIVmix))
                              chr  pos ref var        p.val   freq.var
1  B.FR.83.HXB2_LAI_IIIB_BRU_K034 2127   G   A 3.814024e-05 0.02903526
51 B.FR.83.HXB2_LAI_IIIB_BRU_K034 2130   T   C 1.423636e-07 0.03076923
70 B.FR.83.HXB2_LAI_IIIB_BRU_K034 2139   A   G 5.815022e-09 0.03362573
71 B.FR.83.HXB2_LAI_IIIB_BRU_K034 2141   A   G 4.271206e-09 0.03333333
52 B.FR.83.HXB2_LAI_IIIB_BRU_K034 2150   A   C 9.543121e-04 0.01763908
2  B.FR.83.HXB2_LAI_IIIB_BRU_K034 2151   G   A 4.527563e-08 0.02815013
   sigma2.freq.var n.tst.fw cov.tst.fw n.tst.bw cov.tst.bw n.ctrl.fw
1     4.892000e-05       17        581        2         47         2
51    4.733728e-05       16        597        4         53         0
70    4.916043e-05       16        599        7         85         0
71    4.830918e-05       16        599        7         91         0
52    2.393362e-05       10        609        3        128         0
2     3.773476e-05       16        614        5        132         0
   cov.ctrl.fw n.ctrl.bw cov.ctrl.bw    raw.p.val
1         1537         0         103 6.306257e-09
51        1534         0         104 2.353895e-11
70        1535         0         158 9.614784e-13
71        1535         0         181 7.062179e-13
52        1538         0         283 1.577897e-07
2         1539         0         287 7.486050e-12
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>