Last data update: 2014.03.03

R: Basic VCF filter function
filterVcfBasicR Documentation

Basic VCF filter function

Description

Function to remove artifacts and low confidence/quality variant calls.

Usage

filterVcfBasic(vcf, tumor.id.in.vcf = NULL, use.somatic.status = TRUE, 
    snp.blacklist = NULL, af.range = c(0.03, 0.97), 
    contamination.cutoff = c(0.05, 0.075), coverage.cutoff = 20, 
    min.supporting.reads = 3, verbose = TRUE)

Arguments

vcf

CollapsedVCF object, read in with the readVcf function from the VariantAnnotation package.

tumor.id.in.vcf

The tumor id in the CollapsedVCF (optional).

use.somatic.status

If somatic status and germline data is available, then use this information to remove non-heterozygous germline SNPs or germline SNPS with biased allelic fractions.

snp.blacklist

CSV file with SNP ids with expected allelic fraction significantly different from 0.5 in diploid genomes. Can be an array of lists. The function createSNPBlacklist can provide appropriate black lists.

af.range

Exclude SNPs with allelic fraction smaller or greater than the two values, respectively.

contamination.cutoff

Count SNPs in dbSNP with allelic fraction smaller than the first value, if found on most chromosomes, remove all with AF smaller than the second value.

coverage.cutoff

Minimum coverage in tumor. Variants with lower coverage are ignored.

min.supporting.reads

Minimum number of reads supporting the alt allele.

verbose

Value

A list with elements

vcf

The filtered CollapsedVCF object.

flag

A flag (TRUE/FALSE) if problems were identified.

flag_comment

A comment describing the flagging.

Author(s)

Markus Riester

Examples

# This function is typically only called by runAbsolute via the 
# fun.filterVcf and args.filterVcf comments.
library(VariantAnnotation)    
vcf.file <- system.file("extdata", "example_vcf.vcf", package="PureCN")
vcf <- readVcf(vcf.file, "hg19")
vcf.filtered <- filterVcfBasic(vcf)        

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(PureCN)
Loading required package: DNAcopy
Loading required package: VariantAnnotation
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: GenomeInfoDb
Loading required package: stats4
Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: Rsamtools
Loading required package: Biostrings
Loading required package: XVector

Attaching package: 'VariantAnnotation'

The following object is masked from 'package:base':

    tabulate

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/PureCN/filterVcfBasic.Rd_%03d_medium.png", width=480, height=480)
> ### Name: filterVcfBasic
> ### Title: Basic VCF filter function
> ### Aliases: filterVcfBasic
> 
> ### ** Examples
> 
> # This function is typically only called by runAbsolute via the 
> # fun.filterVcf and args.filterVcf comments.
> library(VariantAnnotation)    
> vcf.file <- system.file("extdata", "example_vcf.vcf", package="PureCN")
> vcf <- readVcf(vcf.file, "hg19")
> vcf.filtered <- filterVcfBasic(vcf)        
Removing 0 non heterozygous (in matched normal) germline SNPs.
Removing 108 SNPs with AF < 0.03 or AF >= 0.97 or less than 3 supporting reads or depth < 20.
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>