A list with elements
Counts (a 4d integer array of size [1:12, 1:2, 1:k, 1:n]),
Coverage (a 3d integer array of size [1:2, 1:k, 1:n]),
Deletions (a 3d integer array of size [1:2, 1:k, 1:n]),
Reference (a 1d integer vector of size [1:n]) – see Details.
sampledata
A data.frame with k rows (one for each
sample) and columns Column and (Sample.
The tally file should contain this information as a group attribute, see getSampleData for an example.
samples
The samples on which variants should be called, by default all samples specified in sampledata are used
errorRate
The expected error rate of the sequencing technology that was used, for illumina this should be 1/1000
minSupport
minimal support required for a position to be considered variant
minAF
minimal allelic frequency for an allele at a position to be cosidered a variant
minStrandSupport
minimal per-strand support for a position to be considered variant
mergeDels
Boolean flag to specify that adjacent deletion calls should be
merged
aggregator
Aggregator function for merging statistics of adjacent deletion calls,
defaults to mean, which means that a deletion larger than 1bp
will be annotated with the means of the counts and coverages etc.
Details
data is a list of datasets which has to at least contain the
Counts and Coverages for variant calling respectively
Deletions for deletion calling (if Deletions is not present no deletion calls will be made).
This list will usually be generated by a call to the h5dapply function in which the tally
file, chromosome, datasets and regions within the datasets would be
specified. See h5dapply for specifics.
callVariantsSingle implements a simple single sample variant callign approach for SNVs and deletions (if Deletions is a dataset present in the data parameter. The function applies three essential filters to the provided data, requiring:
- minSupport total support for the variant at the position
- minStrandSupport support for the variant on each strand
- an allele freqeuncy of at least minAF (for pure diploid samples this can be set relatively high, e.g. 0.3, for calling potentially homozygous variants a value of 0.8 or higher might be used)
Calls are annotated with the p-Value of a binom.test of the present support and coverage given the error rate provided in the errorRate parameter, no filtering is done on this annotation.
Adjacent deletion calls are merged based in the value of the mergeDels parameter and their statistics are aggregated with the function supplied in the aggregator parameter.
Value
This function returns a data.frame containing annotated calls with the following slots:
Chrom
The chromosome the potential variant / deletion is on
Start
The starting position of the variant / deletion
End
The end position of the variant / deletions (equal to Start for SNVs and single basepair deletions)
Sample
The sample in which the variant was called
altAllele
The alternate allele for SNVs (deletions will have a "-" in that slot)
refAllele
The reference allele for SNVs (deletions will have the deleted sequence here as extracted from the Reference dataset, if the tally file contains a sparse representation of the reference, i.e. only positions with mismatches show a reference value the missing values are substituted with "N"'s. It is strongly suggested to write the whole reference into the tally file prior to deletion calling - see writeReference for details)
SupFwd
Support for the variant in the sample on the forward strand
SupRev
Support for the variant in the sample on the reverse strand
CovFwd
Coverage of the variant position in the sample on the forward strand
CovRev
Coverage of the variant position in the sample on the reverse strand
AF_Fwd
Allele frequency of the variant in the sample on the forward strand
AF_Rev
Allele frequency of the variant in the sample on the reverse strand
Support
Total Support of the variant - i.e. SupFwd + SupRev
Coverage
Total Coverage of the variant position - i.e. CovFwd + CovRev
AF
Total allele frequency of the variant, i.e. Support / Coverage
fBackground
Background frequency of the variant in all samples but the one the variant is called in
pErrorFwd
Probablity of the observed support and coverage given the error rate on the forward strand
pErrorRev
Probablity of the observed support and coverage given the error rate on the reverse strand
pError
Probablity of the observed support and coverage given the error rate on both strands combined
pError
Coverage of the variant position in the Control sample on the forward strand
pStrand
p-Value of a fisher.test on the contingency matrix matrix(c(CovFwd,CovRev,SupFwd,SupRev), nrow = 2) at this position - low values could indicate strand bias
Author(s)
Paul Pyl
Examples
library(h5vc) # loading library
tallyFile <- system.file( "extdata", "example.tally.hfs5", package = "h5vcData" )
sampleData <- getSampleData( tallyFile, "/ExampleStudy/16" )
position <- 29979629
windowsize <- 1000
vars <- h5dapply( # Calling Variants
filename = tallyFile,
group = "/ExampleStudy/16",
blocksize = 500,
FUN = callVariantsSingle,
sampledata = sampleData,
names = c("Coverages", "Counts", "Reference", "Deletions"),
range = c(position - windowsize, position + windowsize)
)
vars <- do.call( rbind, vars ) # merge the results from all blocks by row
vars # We did find a variant
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(h5vc)
Loading required package: grid
Loading required package: gridExtra
Loading required package: ggplot2
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/h5vc/callVariantsSingle.Rd_%03d_medium.png", width=480, height=480)
> ### Name: callVariantsSingle
> ### Title: Single sample variant calling
> ### Aliases: callVariantsSingle
>
> ### ** Examples
>
> library(h5vc) # loading library
> tallyFile <- system.file( "extdata", "example.tally.hfs5", package = "h5vcData" )
> sampleData <- getSampleData( tallyFile, "/ExampleStudy/16" )
> position <- 29979629
> windowsize <- 1000
> vars <- h5dapply( # Calling Variants
+ filename = tallyFile,
+ group = "/ExampleStudy/16",
+ blocksize = 500,
+ FUN = callVariantsSingle,
+ sampledata = sampleData,
+ names = c("Coverages", "Counts", "Reference", "Deletions"),
+ range = c(position - windowsize, position + windowsize)
+ )
> vars <- do.call( rbind, vars ) # merge the results from all blocks by row
> vars # We did find a variant
Sample Chrom Start End refAllele
29978629:29979128.199 PT8PrimaryDNA 16 29978827 29978827 G
29978629:29979128.433 PT8PrimaryDNA 16 29979061 29979061 C
29978629:29979128.177 PT8PrimaryDNA 16 29978805 29978805 A
29978629:29979128.437 PT8PrimaryDNA 16 29979065 29979065 A
29978629:29979128.436 PT8PrimaryDNA 16 29979064 29979064 C
29978629:29979128.1991 PT5PrimaryDNA 16 29978827 29978827 G
29978629:29979128.4331 PT5PrimaryDNA 16 29979061 29979061 C
29978629:29979128.1771 PT5PrimaryDNA 16 29978805 29978805 A
29978629:29979128.4371 PT5PrimaryDNA 16 29979065 29979065 A
29978629:29979128.4361 PT5PrimaryDNA 16 29979064 29979064 C
29978629:29979128.119 PT5RelapseDNA 16 29978747 29978747 C
29978629:29979128.1992 PT5RelapseDNA 16 29978827 29978827 G
29978629:29979128.4332 PT5RelapseDNA 16 29979061 29979061 C
29978629:29979128.1772 PT5RelapseDNA 16 29978805 29978805 A
29978629:29979128.4372 PT5RelapseDNA 16 29979065 29979065 A
29978629:29979128.4362 PT5RelapseDNA 16 29979064 29979064 C
29978629:29979128.1993 PT8EarlyStageDNA 16 29978827 29978827 G
29978629:29979128.247 PT8EarlyStageDNA 16 29978875 29978875 T
29978629:29979128.4333 PT8EarlyStageDNA 16 29979061 29979061 C
29978629:29979128.1773 PT8EarlyStageDNA 16 29978805 29978805 A
29978629:29979128.4373 PT8EarlyStageDNA 16 29979065 29979065 A
29978629:29979128.4363 PT8EarlyStageDNA 16 29979064 29979064 C
29978629:29979128.1994 PT5ControlDNA 16 29978827 29978827 G
29978629:29979128.4334 PT5ControlDNA 16 29979061 29979061 C
29978629:29979128.476 PT5ControlDNA 16 29979104 29979104 T
29978629:29979128.1774 PT5ControlDNA 16 29978805 29978805 A
29978629:29979128.4374 PT5ControlDNA 16 29979065 29979065 A
29978629:29979128.4364 PT5ControlDNA 16 29979064 29979064 C
29978629:29979128.1995 PT8ControlDNA 16 29978827 29978827 G
29978629:29979128.1775 PT8ControlDNA 16 29978805 29978805 A
29979129:29979628.176 PT8PrimaryDNA 16 29979304 29979304 A
29979129:29979628.98 PT8PrimaryDNA 16 29979226 29979226 T
29979129:29979628.117 PT5PrimaryDNA 16 29979245 29979245 C
29979129:29979628.1761 PT8EarlyStageDNA 16 29979304 29979304 A
29979129:29979628.51 PT8EarlyStageDNA 16 29979179 29979179 C
29979129:29979628.120 PT8EarlyStageDNA 16 29979248 29979248 A
29979129:29979628.133 PT5ControlDNA 16 29979261 29979261 G
29979129:29979628.500 PT5ControlDNA 16 29979628 29979628 G
29979129:29979628.1762 PT5ControlDNA 16 29979304 29979304 A
29979129:29979628.981 PT5ControlDNA 16 29979226 29979226 T
29980129:29980628.381 PT8PrimaryDNA 16 29980509 29980509 T
29980129:29980628.377 PT8PrimaryDNA 16 29980505 29980505 C
29980129:29980628.429 PT5PrimaryDNA 16 29980557 29980557 A
29980129:29980628.3771 PT5PrimaryDNA 16 29980505 29980505 C
29980129:29980628.408 PT5PrimaryDNA 16 29980536 29980536 G
29980129:29980628.378 PT5RelapseDNA 16 29980506 29980506 T
29980129:29980628.3772 PT5ControlDNA 16 29980505 29980505 C
29980129:29980628.3773 PT8ControlDNA 16 29980505 29980505 C
altAllele SupFwd SupRev CovFwd CovRev AF_Fwd
29978629:29979128.199 A 12 9 18 21 0.66666667
29978629:29979128.433 A 10 5 14 16 0.71428571
29978629:29979128.177 G 17 15 17 15 1.00000000
29978629:29979128.437 G 8 5 13 17 0.61538462
29978629:29979128.436 T 8 5 13 16 0.61538462
29978629:29979128.1991 A 18 25 18 25 1.00000000
29978629:29979128.4331 A 12 9 12 9 1.00000000
29978629:29979128.1771 G 20 18 20 18 1.00000000
29978629:29979128.4371 G 12 9 12 9 1.00000000
29978629:29979128.4361 T 12 9 12 9 1.00000000
29978629:29979128.119 A 1 1 7 15 0.14285714
29978629:29979128.1992 A 10 4 10 4 1.00000000
29978629:29979128.4332 A 10 6 10 6 1.00000000
29978629:29979128.1772 G 8 6 8 6 1.00000000
29978629:29979128.4372 G 10 6 10 6 1.00000000
29978629:29979128.4362 T 10 6 10 6 1.00000000
29978629:29979128.1993 A 18 8 18 8 1.00000000
29978629:29979128.247 A 1 1 18 19 0.05555556
29978629:29979128.4333 A 15 10 16 10 0.93750000
29978629:29979128.1773 G 18 14 18 14 1.00000000
29978629:29979128.4373 G 13 10 16 10 0.81250000
29978629:29979128.4363 T 13 10 16 10 0.81250000
29978629:29979128.1994 A 12 11 24 21 0.50000000
29978629:29979128.4334 A 7 6 16 14 0.43750000
29978629:29979128.476 C 1 1 21 15 0.04761905
29978629:29979128.1774 G 24 13 25 13 0.96000000
29978629:29979128.4374 G 7 6 17 14 0.41176471
29978629:29979128.4364 T 7 6 16 14 0.43750000
29978629:29979128.1995 A 5 3 11 5 0.45454545
29978629:29979128.1775 G 11 6 11 6 1.00000000
29979129:29979628.176 C 3 1 26 17 0.11538462
29979129:29979628.98 G 1 1 9 21 0.11111111
29979129:29979628.117 A 1 1 18 17 0.05555556
29979129:29979628.1761 C 2 2 34 21 0.05882353
29979129:29979628.51 G 2 1 25 32 0.08000000
29979129:29979628.120 - 2 1 21 22 0.09523810
29979129:29979628.133 A 2 1 17 22 0.11764706
29979129:29979628.500 A 6 11 19 23 0.31578947
29979129:29979628.1762 C 1 4 25 25 0.04000000
29979129:29979628.981 G 1 1 17 19 0.05882353
29980129:29980628.381 C 1 1 18 22 0.05555556
29980129:29980628.377 T 1 11 20 23 0.05000000
29980129:29980628.429 G 1 1 18 22 0.05555556
29980129:29980628.3771 T 1 8 35 26 0.02857143
29980129:29980628.408 T 2 1 17 29 0.11764706
29980129:29980628.378 C 1 1 6 6 0.16666667
29980129:29980628.3772 T 1 5 18 14 0.05555556
29980129:29980628.3773 T 1 1 6 9 0.16666667
AF_Rev Coverage Support AF fBackground
29978629:29979128.199 0.42857143 39 21 0.53846154 0.791666667
29978629:29979128.433 0.31250000 30 15 0.50000000 0.752380952
29978629:29979128.177 1.00000000 32 32 1.00000000 1.000000000
29978629:29979128.437 0.29411765 30 13 0.43333333 0.726415094
29978629:29979128.436 0.31250000 29 13 0.44827586 0.733333333
29978629:29979128.1991 1.00000000 43 43 1.00000000 0.657142857
29978629:29979128.4331 1.00000000 21 21 1.00000000 0.640350877
29978629:29979128.1771 1.00000000 38 38 1.00000000 1.000000000
29978629:29979128.4371 1.00000000 21 21 1.00000000 0.600000000
29978629:29979128.4361 1.00000000 21 21 1.00000000 0.610619469
29978629:29979128.119 0.06666667 22 2 0.09090909 0.000000000
29978629:29979128.1992 1.00000000 14 14 1.00000000 0.715976331
29978629:29979128.4332 1.00000000 16 16 1.00000000 0.655462185
29978629:29979128.1772 1.00000000 14 14 1.00000000 1.000000000
29978629:29979128.4372 1.00000000 16 16 1.00000000 0.616666667
29978629:29979128.4362 1.00000000 16 16 1.00000000 0.627118644
29978629:29979128.1993 1.00000000 26 26 1.00000000 0.694267516
29978629:29979128.247 0.05263158 37 2 0.05405405 0.012048193
29978629:29979128.4333 1.00000000 26 25 0.96153846 0.633027523
29978629:29979128.1773 1.00000000 32 32 1.00000000 1.000000000
29978629:29979128.4373 1.00000000 26 23 0.88461538 0.609090909
29978629:29979128.4363 1.00000000 26 23 0.88461538 0.620370370
29978629:29979128.1994 0.52380952 45 23 0.51111111 0.811594203
29978629:29979128.4334 0.42857143 30 13 0.43333333 0.771428571
29978629:29979128.476 0.06666667 36 2 0.05555556 0.007092199
29978629:29979128.1774 1.00000000 38 37 0.97368421 1.000000000
29978629:29979128.4374 0.42857143 31 13 0.41935484 0.733333333
29978629:29979128.4364 0.42857143 30 13 0.43333333 0.740384615
29978629:29979128.1995 0.60000000 16 8 0.50000000 0.760479042
29978629:29979128.1775 1.00000000 17 17 1.00000000 1.000000000
29979129:29979628.176 0.05882353 43 4 0.09302326 0.057971014
29979129:29979628.98 0.04761905 30 2 0.06666667 0.037735849
29979129:29979628.117 0.05882353 35 2 0.05714286 0.000000000
29979129:29979628.1761 0.09523810 55 4 0.07272727 0.056410256
29979129:29979628.51 0.03125000 57 3 0.05263158 0.016216216
29979129:29979628.120 0.04545455 43 3 0.06976744 0.033783784
29979129:29979628.133 0.04545455 39 3 0.07692308 0.060606061
29979129:29979628.500 0.47826087 42 17 0.40476190 0.005102041
29979129:29979628.1762 0.16000000 50 5 0.10000000 0.055000000
29979129:29979628.981 0.05263158 36 2 0.05555556 0.039215686
29980129:29980628.381 0.04545455 40 2 0.05000000 0.046357616
29980129:29980628.377 0.47826087 43 12 0.27906977 0.141104294
29980129:29980628.429 0.04545455 40 2 0.05000000 0.053097345
29980129:29980628.3771 0.30769231 61 9 0.14754098 0.193103448
29980129:29980628.408 0.03448276 46 3 0.06521739 0.026785714
29980129:29980628.378 0.16666667 12 2 0.16666667 0.051282051
29980129:29980628.3772 0.35714286 32 6 0.18750000 0.172413793
29980129:29980628.3773 0.11111111 15 2 0.13333333 0.183246073
pErrorFwd pErrorRev pError pStrand
29978629:29979128.199 1.846142e-32 2.907714e-22 6.129637e-53 0.58888219
29978629:29979128.433 9.973650e-28 4.328131e-12 1.529505e-37 0.34206698
29978629:29979128.177 1.000000e-51 1.000000e-45 1.000000e-96 1.00000000
29978629:29979128.437 1.281290e-21 6.126411e-12 1.178834e-31 0.33188888
29978629:29979128.436 1.281290e-21 4.328131e-12 6.686268e-32 0.50547558
29978629:29979128.1991 1.000000e-54 1.000000e-75 1.000000e-129 1.00000000
29978629:29979128.4331 1.000000e-36 1.000000e-27 1.000000e-63 1.00000000
29978629:29979128.1771 1.000000e-60 1.000000e-54 1.000000e-114 1.00000000
29978629:29979128.4371 1.000000e-36 1.000000e-27 1.000000e-63 1.00000000
29978629:29979128.4361 1.000000e-36 1.000000e-27 1.000000e-63 1.00000000
29978629:29979128.119 6.979035e-03 1.489545e-02 2.279418e-04 1.00000000
29978629:29979128.1992 1.000000e-30 1.000000e-12 1.000000e-42 1.00000000
29978629:29979128.4332 1.000000e-30 1.000000e-18 1.000000e-48 1.00000000
29978629:29979128.1772 1.000000e-24 1.000000e-18 1.000000e-42 1.00000000
29978629:29979128.4372 1.000000e-30 1.000000e-18 1.000000e-48 1.00000000
29978629:29979128.4362 1.000000e-30 1.000000e-18 1.000000e-48 1.00000000
29978629:29979128.1993 1.000000e-54 1.000000e-24 1.000000e-78 1.00000000
29978629:29979128.247 1.784781e-02 1.882997e-02 6.506564e-04 1.00000000
29978629:29979128.4333 1.598500e-44 1.000000e-30 2.597500e-74 1.00000000
29978629:29979128.1773 1.000000e-54 1.000000e-42 1.000000e-96 1.00000000
29978629:29979128.4373 5.584415e-37 1.000000e-30 2.592532e-66 0.77672232
29978629:29979128.4363 5.584415e-37 1.000000e-30 2.592532e-66 0.77672232
29978629:29979128.1994 2.674355e-30 3.494962e-28 4.030791e-57 1.00000000
29978629:29979128.4334 1.135023e-17 2.982471e-15 1.178834e-31 1.00000000
29978629:29979128.476 2.079132e-02 1.489545e-02 6.158952e-04 1.00000000
29978629:29979128.1774 2.497600e-71 1.000000e-39 3.796300e-110 1.00000000
29978629:29979128.4374 1.927851e-17 2.982471e-15 2.028329e-31 1.00000000
29978629:29979128.4364 1.135023e-17 2.982471e-15 1.178834e-31 1.00000000
29978629:29979128.1995 4.596949e-13 9.985006e-09 1.277877e-20 1.00000000
29978629:29979128.1775 1.000000e-33 1.000000e-18 1.000000e-51 1.00000000
29979129:29979628.176 2.555542e-06 1.686468e-02 1.196199e-07 1.00000000
29979129:29979628.98 8.964084e-03 2.079132e-02 4.269616e-04 0.53427419
29979129:29979628.117 1.784781e-02 1.686468e-02 5.820658e-04 1.00000000
29979129:29979628.1761 5.491700e-04 2.073579e-04 3.274258e-07 0.63917684
29979129:29979628.51 2.954377e-04 3.150892e-02 2.809973e-05 0.58340152
29979129:29979628.120 2.073579e-04 2.177053e-02 1.197649e-05 1.00000000
29979129:29979628.133 1.346471e-04 2.177053e-02 8.895669e-06 0.58127178
29979129:29979628.500 2.683125e-14 1.337280e-27 2.487169e-40 0.56843446
29979129:29979628.1762 2.470229e-02 1.243924e-08 2.040786e-09 0.35518709
29979129:29979628.981 1.686468e-02 1.882997e-02 6.158952e-04 1.00000000
29980129:29980628.381 1.784781e-02 2.177053e-02 7.605116e-04 1.00000000
29980129:29980628.377 1.981114e-02 1.337280e-27 1.490581e-26 0.01943211
29980129:29980628.429 1.784781e-02 2.177053e-02 7.605116e-04 1.00000000
29980129:29980628.3771 3.441149e-02 1.537469e-18 1.654870e-17 0.01230470
29980129:29980628.408 1.346471e-04 2.859763e-02 1.469858e-05 0.55140035
29980129:29980628.378 5.985020e-03 5.985020e-03 6.556148e-05 1.00000000
29980129:29980628.3772 1.784781e-02 1.987036e-12 8.862162e-13 0.17971218
29980129:29980628.3773 5.985020e-03 8.964084e-03 1.040941e-04 1.00000000
>
>
>
>
>
> dev.off()
null device
1
>