Last data update: 2014.03.03

R: Single sample variant calling
callVariantsSingleR Documentation

Single sample variant calling

Description

A simple single sample variant calling function (calling SNVs and deletions)

Usage

callVariantsSingle( data, sampledata, samples = sampledata$Sample, errorRate = 0.001, minSupport = 2, minAF = 0.05, minStrandSupport = 1, mergeDels = TRUE, aggregator = mean)

Arguments

data

A list with elements Counts (a 4d integer array of size [1:12, 1:2, 1:k, 1:n]), Coverage (a 3d integer array of size [1:2, 1:k, 1:n]), Deletions (a 3d integer array of size [1:2, 1:k, 1:n]), Reference (a 1d integer vector of size [1:n]) – see Details.

sampledata

A data.frame with k rows (one for each sample) and columns Column and (Sample. The tally file should contain this information as a group attribute, see getSampleData for an example.

samples

The samples on which variants should be called, by default all samples specified in sampledata are used

errorRate

The expected error rate of the sequencing technology that was used, for illumina this should be 1/1000

minSupport

minimal support required for a position to be considered variant

minAF

minimal allelic frequency for an allele at a position to be cosidered a variant

minStrandSupport

minimal per-strand support for a position to be considered variant

mergeDels

Boolean flag to specify that adjacent deletion calls should be merged

aggregator

Aggregator function for merging statistics of adjacent deletion calls, defaults to mean, which means that a deletion larger than 1bp will be annotated with the means of the counts and coverages etc.

Details

data is a list of datasets which has to at least contain the Counts and Coverages for variant calling respectively Deletions for deletion calling (if Deletions is not present no deletion calls will be made). This list will usually be generated by a call to the h5dapply function in which the tally file, chromosome, datasets and regions within the datasets would be specified. See h5dapply for specifics.

callVariantsSingle implements a simple single sample variant callign approach for SNVs and deletions (if Deletions is a dataset present in the data parameter. The function applies three essential filters to the provided data, requiring:

- minSupport total support for the variant at the position - minStrandSupport support for the variant on each strand - an allele freqeuncy of at least minAF (for pure diploid samples this can be set relatively high, e.g. 0.3, for calling potentially homozygous variants a value of 0.8 or higher might be used)

Calls are annotated with the p-Value of a binom.test of the present support and coverage given the error rate provided in the errorRate parameter, no filtering is done on this annotation.

Adjacent deletion calls are merged based in the value of the mergeDels parameter and their statistics are aggregated with the function supplied in the aggregator parameter.

Value

This function returns a data.frame containing annotated calls with the following slots:

Chrom

The chromosome the potential variant / deletion is on

Start

The starting position of the variant / deletion

End

The end position of the variant / deletions (equal to Start for SNVs and single basepair deletions)

Sample

The sample in which the variant was called

altAllele

The alternate allele for SNVs (deletions will have a "-" in that slot)

refAllele

The reference allele for SNVs (deletions will have the deleted sequence here as extracted from the Reference dataset, if the tally file contains a sparse representation of the reference, i.e. only positions with mismatches show a reference value the missing values are substituted with "N"'s. It is strongly suggested to write the whole reference into the tally file prior to deletion calling - see writeReference for details)

SupFwd

Support for the variant in the sample on the forward strand

SupRev

Support for the variant in the sample on the reverse strand

CovFwd

Coverage of the variant position in the sample on the forward strand

CovRev

Coverage of the variant position in the sample on the reverse strand

AF_Fwd

Allele frequency of the variant in the sample on the forward strand

AF_Rev

Allele frequency of the variant in the sample on the reverse strand

Support

Total Support of the variant - i.e. SupFwd + SupRev

Coverage

Total Coverage of the variant position - i.e. CovFwd + CovRev

AF

Total allele frequency of the variant, i.e. Support / Coverage

fBackground

Background frequency of the variant in all samples but the one the variant is called in

pErrorFwd

Probablity of the observed support and coverage given the error rate on the forward strand

pErrorRev

Probablity of the observed support and coverage given the error rate on the reverse strand

pError

Probablity of the observed support and coverage given the error rate on both strands combined

pError

Coverage of the variant position in the Control sample on the forward strand

pStrand

p-Value of a fisher.test on the contingency matrix matrix(c(CovFwd,CovRev,SupFwd,SupRev), nrow = 2) at this position - low values could indicate strand bias

Author(s)

Paul Pyl

Examples

  library(h5vc) # loading library
  tallyFile <- system.file( "extdata", "example.tally.hfs5", package = "h5vcData" )
  sampleData <- getSampleData( tallyFile, "/ExampleStudy/16" )
  position <- 29979629
  windowsize <- 1000
  vars <- h5dapply( # Calling Variants
    filename = tallyFile,
    group = "/ExampleStudy/16",
    blocksize = 500,
    FUN = callVariantsSingle,
    sampledata = sampleData,
    names = c("Coverages", "Counts", "Reference", "Deletions"),
    range = c(position - windowsize, position + windowsize)
  )
  vars <- do.call( rbind, vars ) # merge the results from all blocks by row
  vars # We did find a variant

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(h5vc)
Loading required package: grid
Loading required package: gridExtra
Loading required package: ggplot2
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/h5vc/callVariantsSingle.Rd_%03d_medium.png", width=480, height=480)
> ### Name: callVariantsSingle
> ### Title: Single sample variant calling
> ### Aliases: callVariantsSingle
> 
> ### ** Examples
> 
>   library(h5vc) # loading library
>   tallyFile <- system.file( "extdata", "example.tally.hfs5", package = "h5vcData" )
>   sampleData <- getSampleData( tallyFile, "/ExampleStudy/16" )
>   position <- 29979629
>   windowsize <- 1000
>   vars <- h5dapply( # Calling Variants
+     filename = tallyFile,
+     group = "/ExampleStudy/16",
+     blocksize = 500,
+     FUN = callVariantsSingle,
+     sampledata = sampleData,
+     names = c("Coverages", "Counts", "Reference", "Deletions"),
+     range = c(position - windowsize, position + windowsize)
+   )
>   vars <- do.call( rbind, vars ) # merge the results from all blocks by row
>   vars # We did find a variant
                                 Sample Chrom    Start      End refAllele
29978629:29979128.199     PT8PrimaryDNA    16 29978827 29978827         G
29978629:29979128.433     PT8PrimaryDNA    16 29979061 29979061         C
29978629:29979128.177     PT8PrimaryDNA    16 29978805 29978805         A
29978629:29979128.437     PT8PrimaryDNA    16 29979065 29979065         A
29978629:29979128.436     PT8PrimaryDNA    16 29979064 29979064         C
29978629:29979128.1991    PT5PrimaryDNA    16 29978827 29978827         G
29978629:29979128.4331    PT5PrimaryDNA    16 29979061 29979061         C
29978629:29979128.1771    PT5PrimaryDNA    16 29978805 29978805         A
29978629:29979128.4371    PT5PrimaryDNA    16 29979065 29979065         A
29978629:29979128.4361    PT5PrimaryDNA    16 29979064 29979064         C
29978629:29979128.119     PT5RelapseDNA    16 29978747 29978747         C
29978629:29979128.1992    PT5RelapseDNA    16 29978827 29978827         G
29978629:29979128.4332    PT5RelapseDNA    16 29979061 29979061         C
29978629:29979128.1772    PT5RelapseDNA    16 29978805 29978805         A
29978629:29979128.4372    PT5RelapseDNA    16 29979065 29979065         A
29978629:29979128.4362    PT5RelapseDNA    16 29979064 29979064         C
29978629:29979128.1993 PT8EarlyStageDNA    16 29978827 29978827         G
29978629:29979128.247  PT8EarlyStageDNA    16 29978875 29978875         T
29978629:29979128.4333 PT8EarlyStageDNA    16 29979061 29979061         C
29978629:29979128.1773 PT8EarlyStageDNA    16 29978805 29978805         A
29978629:29979128.4373 PT8EarlyStageDNA    16 29979065 29979065         A
29978629:29979128.4363 PT8EarlyStageDNA    16 29979064 29979064         C
29978629:29979128.1994    PT5ControlDNA    16 29978827 29978827         G
29978629:29979128.4334    PT5ControlDNA    16 29979061 29979061         C
29978629:29979128.476     PT5ControlDNA    16 29979104 29979104         T
29978629:29979128.1774    PT5ControlDNA    16 29978805 29978805         A
29978629:29979128.4374    PT5ControlDNA    16 29979065 29979065         A
29978629:29979128.4364    PT5ControlDNA    16 29979064 29979064         C
29978629:29979128.1995    PT8ControlDNA    16 29978827 29978827         G
29978629:29979128.1775    PT8ControlDNA    16 29978805 29978805         A
29979129:29979628.176     PT8PrimaryDNA    16 29979304 29979304         A
29979129:29979628.98      PT8PrimaryDNA    16 29979226 29979226         T
29979129:29979628.117     PT5PrimaryDNA    16 29979245 29979245         C
29979129:29979628.1761 PT8EarlyStageDNA    16 29979304 29979304         A
29979129:29979628.51   PT8EarlyStageDNA    16 29979179 29979179         C
29979129:29979628.120  PT8EarlyStageDNA    16 29979248 29979248         A
29979129:29979628.133     PT5ControlDNA    16 29979261 29979261         G
29979129:29979628.500     PT5ControlDNA    16 29979628 29979628         G
29979129:29979628.1762    PT5ControlDNA    16 29979304 29979304         A
29979129:29979628.981     PT5ControlDNA    16 29979226 29979226         T
29980129:29980628.381     PT8PrimaryDNA    16 29980509 29980509         T
29980129:29980628.377     PT8PrimaryDNA    16 29980505 29980505         C
29980129:29980628.429     PT5PrimaryDNA    16 29980557 29980557         A
29980129:29980628.3771    PT5PrimaryDNA    16 29980505 29980505         C
29980129:29980628.408     PT5PrimaryDNA    16 29980536 29980536         G
29980129:29980628.378     PT5RelapseDNA    16 29980506 29980506         T
29980129:29980628.3772    PT5ControlDNA    16 29980505 29980505         C
29980129:29980628.3773    PT8ControlDNA    16 29980505 29980505         C
                       altAllele SupFwd SupRev CovFwd CovRev     AF_Fwd
29978629:29979128.199          A     12      9     18     21 0.66666667
29978629:29979128.433          A     10      5     14     16 0.71428571
29978629:29979128.177          G     17     15     17     15 1.00000000
29978629:29979128.437          G      8      5     13     17 0.61538462
29978629:29979128.436          T      8      5     13     16 0.61538462
29978629:29979128.1991         A     18     25     18     25 1.00000000
29978629:29979128.4331         A     12      9     12      9 1.00000000
29978629:29979128.1771         G     20     18     20     18 1.00000000
29978629:29979128.4371         G     12      9     12      9 1.00000000
29978629:29979128.4361         T     12      9     12      9 1.00000000
29978629:29979128.119          A      1      1      7     15 0.14285714
29978629:29979128.1992         A     10      4     10      4 1.00000000
29978629:29979128.4332         A     10      6     10      6 1.00000000
29978629:29979128.1772         G      8      6      8      6 1.00000000
29978629:29979128.4372         G     10      6     10      6 1.00000000
29978629:29979128.4362         T     10      6     10      6 1.00000000
29978629:29979128.1993         A     18      8     18      8 1.00000000
29978629:29979128.247          A      1      1     18     19 0.05555556
29978629:29979128.4333         A     15     10     16     10 0.93750000
29978629:29979128.1773         G     18     14     18     14 1.00000000
29978629:29979128.4373         G     13     10     16     10 0.81250000
29978629:29979128.4363         T     13     10     16     10 0.81250000
29978629:29979128.1994         A     12     11     24     21 0.50000000
29978629:29979128.4334         A      7      6     16     14 0.43750000
29978629:29979128.476          C      1      1     21     15 0.04761905
29978629:29979128.1774         G     24     13     25     13 0.96000000
29978629:29979128.4374         G      7      6     17     14 0.41176471
29978629:29979128.4364         T      7      6     16     14 0.43750000
29978629:29979128.1995         A      5      3     11      5 0.45454545
29978629:29979128.1775         G     11      6     11      6 1.00000000
29979129:29979628.176          C      3      1     26     17 0.11538462
29979129:29979628.98           G      1      1      9     21 0.11111111
29979129:29979628.117          A      1      1     18     17 0.05555556
29979129:29979628.1761         C      2      2     34     21 0.05882353
29979129:29979628.51           G      2      1     25     32 0.08000000
29979129:29979628.120          -      2      1     21     22 0.09523810
29979129:29979628.133          A      2      1     17     22 0.11764706
29979129:29979628.500          A      6     11     19     23 0.31578947
29979129:29979628.1762         C      1      4     25     25 0.04000000
29979129:29979628.981          G      1      1     17     19 0.05882353
29980129:29980628.381          C      1      1     18     22 0.05555556
29980129:29980628.377          T      1     11     20     23 0.05000000
29980129:29980628.429          G      1      1     18     22 0.05555556
29980129:29980628.3771         T      1      8     35     26 0.02857143
29980129:29980628.408          T      2      1     17     29 0.11764706
29980129:29980628.378          C      1      1      6      6 0.16666667
29980129:29980628.3772         T      1      5     18     14 0.05555556
29980129:29980628.3773         T      1      1      6      9 0.16666667
                           AF_Rev Coverage Support         AF fBackground
29978629:29979128.199  0.42857143       39      21 0.53846154 0.791666667
29978629:29979128.433  0.31250000       30      15 0.50000000 0.752380952
29978629:29979128.177  1.00000000       32      32 1.00000000 1.000000000
29978629:29979128.437  0.29411765       30      13 0.43333333 0.726415094
29978629:29979128.436  0.31250000       29      13 0.44827586 0.733333333
29978629:29979128.1991 1.00000000       43      43 1.00000000 0.657142857
29978629:29979128.4331 1.00000000       21      21 1.00000000 0.640350877
29978629:29979128.1771 1.00000000       38      38 1.00000000 1.000000000
29978629:29979128.4371 1.00000000       21      21 1.00000000 0.600000000
29978629:29979128.4361 1.00000000       21      21 1.00000000 0.610619469
29978629:29979128.119  0.06666667       22       2 0.09090909 0.000000000
29978629:29979128.1992 1.00000000       14      14 1.00000000 0.715976331
29978629:29979128.4332 1.00000000       16      16 1.00000000 0.655462185
29978629:29979128.1772 1.00000000       14      14 1.00000000 1.000000000
29978629:29979128.4372 1.00000000       16      16 1.00000000 0.616666667
29978629:29979128.4362 1.00000000       16      16 1.00000000 0.627118644
29978629:29979128.1993 1.00000000       26      26 1.00000000 0.694267516
29978629:29979128.247  0.05263158       37       2 0.05405405 0.012048193
29978629:29979128.4333 1.00000000       26      25 0.96153846 0.633027523
29978629:29979128.1773 1.00000000       32      32 1.00000000 1.000000000
29978629:29979128.4373 1.00000000       26      23 0.88461538 0.609090909
29978629:29979128.4363 1.00000000       26      23 0.88461538 0.620370370
29978629:29979128.1994 0.52380952       45      23 0.51111111 0.811594203
29978629:29979128.4334 0.42857143       30      13 0.43333333 0.771428571
29978629:29979128.476  0.06666667       36       2 0.05555556 0.007092199
29978629:29979128.1774 1.00000000       38      37 0.97368421 1.000000000
29978629:29979128.4374 0.42857143       31      13 0.41935484 0.733333333
29978629:29979128.4364 0.42857143       30      13 0.43333333 0.740384615
29978629:29979128.1995 0.60000000       16       8 0.50000000 0.760479042
29978629:29979128.1775 1.00000000       17      17 1.00000000 1.000000000
29979129:29979628.176  0.05882353       43       4 0.09302326 0.057971014
29979129:29979628.98   0.04761905       30       2 0.06666667 0.037735849
29979129:29979628.117  0.05882353       35       2 0.05714286 0.000000000
29979129:29979628.1761 0.09523810       55       4 0.07272727 0.056410256
29979129:29979628.51   0.03125000       57       3 0.05263158 0.016216216
29979129:29979628.120  0.04545455       43       3 0.06976744 0.033783784
29979129:29979628.133  0.04545455       39       3 0.07692308 0.060606061
29979129:29979628.500  0.47826087       42      17 0.40476190 0.005102041
29979129:29979628.1762 0.16000000       50       5 0.10000000 0.055000000
29979129:29979628.981  0.05263158       36       2 0.05555556 0.039215686
29980129:29980628.381  0.04545455       40       2 0.05000000 0.046357616
29980129:29980628.377  0.47826087       43      12 0.27906977 0.141104294
29980129:29980628.429  0.04545455       40       2 0.05000000 0.053097345
29980129:29980628.3771 0.30769231       61       9 0.14754098 0.193103448
29980129:29980628.408  0.03448276       46       3 0.06521739 0.026785714
29980129:29980628.378  0.16666667       12       2 0.16666667 0.051282051
29980129:29980628.3772 0.35714286       32       6 0.18750000 0.172413793
29980129:29980628.3773 0.11111111       15       2 0.13333333 0.183246073
                          pErrorFwd    pErrorRev        pError    pStrand
29978629:29979128.199  1.846142e-32 2.907714e-22  6.129637e-53 0.58888219
29978629:29979128.433  9.973650e-28 4.328131e-12  1.529505e-37 0.34206698
29978629:29979128.177  1.000000e-51 1.000000e-45  1.000000e-96 1.00000000
29978629:29979128.437  1.281290e-21 6.126411e-12  1.178834e-31 0.33188888
29978629:29979128.436  1.281290e-21 4.328131e-12  6.686268e-32 0.50547558
29978629:29979128.1991 1.000000e-54 1.000000e-75 1.000000e-129 1.00000000
29978629:29979128.4331 1.000000e-36 1.000000e-27  1.000000e-63 1.00000000
29978629:29979128.1771 1.000000e-60 1.000000e-54 1.000000e-114 1.00000000
29978629:29979128.4371 1.000000e-36 1.000000e-27  1.000000e-63 1.00000000
29978629:29979128.4361 1.000000e-36 1.000000e-27  1.000000e-63 1.00000000
29978629:29979128.119  6.979035e-03 1.489545e-02  2.279418e-04 1.00000000
29978629:29979128.1992 1.000000e-30 1.000000e-12  1.000000e-42 1.00000000
29978629:29979128.4332 1.000000e-30 1.000000e-18  1.000000e-48 1.00000000
29978629:29979128.1772 1.000000e-24 1.000000e-18  1.000000e-42 1.00000000
29978629:29979128.4372 1.000000e-30 1.000000e-18  1.000000e-48 1.00000000
29978629:29979128.4362 1.000000e-30 1.000000e-18  1.000000e-48 1.00000000
29978629:29979128.1993 1.000000e-54 1.000000e-24  1.000000e-78 1.00000000
29978629:29979128.247  1.784781e-02 1.882997e-02  6.506564e-04 1.00000000
29978629:29979128.4333 1.598500e-44 1.000000e-30  2.597500e-74 1.00000000
29978629:29979128.1773 1.000000e-54 1.000000e-42  1.000000e-96 1.00000000
29978629:29979128.4373 5.584415e-37 1.000000e-30  2.592532e-66 0.77672232
29978629:29979128.4363 5.584415e-37 1.000000e-30  2.592532e-66 0.77672232
29978629:29979128.1994 2.674355e-30 3.494962e-28  4.030791e-57 1.00000000
29978629:29979128.4334 1.135023e-17 2.982471e-15  1.178834e-31 1.00000000
29978629:29979128.476  2.079132e-02 1.489545e-02  6.158952e-04 1.00000000
29978629:29979128.1774 2.497600e-71 1.000000e-39 3.796300e-110 1.00000000
29978629:29979128.4374 1.927851e-17 2.982471e-15  2.028329e-31 1.00000000
29978629:29979128.4364 1.135023e-17 2.982471e-15  1.178834e-31 1.00000000
29978629:29979128.1995 4.596949e-13 9.985006e-09  1.277877e-20 1.00000000
29978629:29979128.1775 1.000000e-33 1.000000e-18  1.000000e-51 1.00000000
29979129:29979628.176  2.555542e-06 1.686468e-02  1.196199e-07 1.00000000
29979129:29979628.98   8.964084e-03 2.079132e-02  4.269616e-04 0.53427419
29979129:29979628.117  1.784781e-02 1.686468e-02  5.820658e-04 1.00000000
29979129:29979628.1761 5.491700e-04 2.073579e-04  3.274258e-07 0.63917684
29979129:29979628.51   2.954377e-04 3.150892e-02  2.809973e-05 0.58340152
29979129:29979628.120  2.073579e-04 2.177053e-02  1.197649e-05 1.00000000
29979129:29979628.133  1.346471e-04 2.177053e-02  8.895669e-06 0.58127178
29979129:29979628.500  2.683125e-14 1.337280e-27  2.487169e-40 0.56843446
29979129:29979628.1762 2.470229e-02 1.243924e-08  2.040786e-09 0.35518709
29979129:29979628.981  1.686468e-02 1.882997e-02  6.158952e-04 1.00000000
29980129:29980628.381  1.784781e-02 2.177053e-02  7.605116e-04 1.00000000
29980129:29980628.377  1.981114e-02 1.337280e-27  1.490581e-26 0.01943211
29980129:29980628.429  1.784781e-02 2.177053e-02  7.605116e-04 1.00000000
29980129:29980628.3771 3.441149e-02 1.537469e-18  1.654870e-17 0.01230470
29980129:29980628.408  1.346471e-04 2.859763e-02  1.469858e-05 0.55140035
29980129:29980628.378  5.985020e-03 5.985020e-03  6.556148e-05 1.00000000
29980129:29980628.3772 1.784781e-02 1.987036e-12  8.862162e-13 0.17971218
29980129:29980628.3773 5.985020e-03 8.964084e-03  1.040941e-04 1.00000000
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>