Use ScanVcfParam() to create a parameter object influencing
which records and fields are imported from a VCF file. Record
parsing is based on genomic coordinates and requires a Tabix index
file. Individual VCF elements can be specified in the ‘fixed’,
‘info’, ‘geno’ and ‘samples’ arguments.
Usage
ScanVcfParam(fixed=character(), info=character(), geno=character(),
samples=character(), trimEmpty=TRUE, which, ...)
## Getters and Setters
vcfFixed(object)
vcfFixed(object) <- value
vcfInfo(object)
vcfInfo(object) <- value
vcfGeno(object)
vcfGeno(object) <- value
vcfSamples(object)
vcfSamples(object) <- value
vcfTrimEmpty(object)
vcfTrimEmpty(object) <- value
vcfWhich(object)
vcfWhich(object) <- value
Arguments
fixed
A character() vector of fixed fields to be returned. Possible
values are ALT, QUAL and FILTER. The CHROM, POS, ID and REF fields are
needed to create the GRanges of variant locations. Because these
are essential fields there is no option to request or omit them. If not
specified, all fields are returned; if fixed=NA only REF is
returned.
info
A character() vector naming the ‘INFO’ fields to return.
scanVcfHeader() returns a vector of available fields. If not
specified, all fields are returned; if info=NA no fields are returned.
geno
A character() vector naming the ‘GENO’ fields to return.
scanVcfHeader() returns a vector of available fields. If not specified,
all fields are returned; if geno=NA no fields are returned and
requests for specific samples are ignored.
samples
A character() vector of sample names to return.
samples(scanVcfHeader()) returns all possible names. If not
specified, data for all samples are returned; if either
samples=NA or geno=NA no fields are returned. Requests
for specific samples when geno=NA are ignored.
trimEmpty
A logical(1) indicating whether ‘GENO’ fields
with no values should be returned.
which
A GRanges describing the sequences and
ranges to be queried. Variants whose POS lies in the interval(s)
[start, end] are returned. If which is not specified all
ranges are returned.
object
An instance of class ScanVcfParam.
value
An instance of the corresponding slot, to be assigned to
object.
...
Arguments passed to methods.
Objects from the Class
Objects can be created by calls of the form ScanVcfParam().
Slots
which:
Object of class "RangesList" indicating
which reference sequence and coordinate variants must overlap.
fixed:
Object of class "character" indicating
fixed fields to be returned.
info:
Object of class "character" indicating
portions of ‘INFO’ to be returned.
geno:
Object of class "character" indicating
portions of ‘GENO’ to be returned.
samples:
Object of class "character" indicating
the samples to be returned.
trimEmpty:
Object of class "logical" indicating
whether empty ‘GENO’ fields are to be returned.
Functions and methods
See 'Usage' for details on invocation.
Constructor:
ScanVcfParam:
Returns a ScanVcfParam object.
The which argument to the constructor can be one of several types,
as documented above.
ScanVcfParam()
fl <- system.file("extdata", "structural.vcf", package="VariantAnnotation")
compressVcf <- bgzip(fl, tempfile())
idx <- indexTabix(compressVcf, "vcf")
tab <- TabixFile(compressVcf, idx)
## ---------------------------------------------------------------------
## 'which' argument
## ---------------------------------------------------------------------
## To subset on genomic coordinates, supply an object
## containing the ranges of interest. These ranges can
## be given directly to the 'param' argument or wrapped
## inside ScanVcfParam() as the 'which' argument.
## When using a list, the outer list names must correspond to valid
## chromosome names in the vcf file. In this example they are "1"
## and "2".
gr1 <- GRanges("1", IRanges(13219, 2827695, name="regionA"))
gr2 <- GRanges(rep("2", 2), IRanges(c(321680, 14477080),
c(321689, 14477090), name=c("regionB", "regionC")))
grl <- GRangesList("1"=gr1, "2"=gr2)
vcf <- readVcf(tab, "hg19", grl)
## Names of the ranges are in the 'paramRangeID' metadata column of the
## GRanges object returned by the rowRanges() accessor.
rowRanges(vcf)
## which can be used for subsetting the VCF object
vcf[rowRanges(vcf)$paramRangeID == "regionA"]
## When using ranges, the seqnames must correspond to valid
## chromosome names in the vcf file.
gr <- unlist(grl, use.names=FALSE)
vcf <- readVcf(tab, "hg19", gr)
## ---------------------------------------------------------------------
## 'fixed', 'info', 'geno' and 'samples' arguments
## ---------------------------------------------------------------------
## This param specifies the "GT" 'geno' field for a single sample
## and the subset of ranges in 'which'. All 'fixed' and 'info' fields
## will be returned.
ScanVcfParam(geno="GT", samples="NA00002", which=gr)
## Here two 'fixed' and one 'geno' field are specified
ScanVcfParam(fixed=c("ALT", "QUAL"), geno="GT", info=NA)
## Return only the 'fixed' fields
ScanVcfParam(geno=NA, info=NA)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(VariantAnnotation)
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: GenomeInfoDb
Loading required package: stats4
Loading required package: S4Vectors
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
Loading required package: Rsamtools
Loading required package: Biostrings
Loading required package: XVector
Attaching package: 'VariantAnnotation'
The following object is masked from 'package:base':
tabulate
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/VariantAnnotation/ScanVcfParam-class.Rd_%03d_medium.png", width=480, height=480)
> ### Name: ScanVcfParam-class
> ### Title: Parameters for scanning VCF files
> ### Aliases: ScanVcfParam ScanVcfParam-class ScanVcfParam,missing-method
> ### ScanVcfParam,ANY-method vcfFixed vcfFixed<- vcfInfo vcfInfo<- vcfGeno
> ### vcfGeno<- vcfSamples vcfSamples<- vcfTrimEmpty vcfTrimEmpty<-
> ### vcfWhich vcfWhich<-
> ### Keywords: classes
>
> ### ** Examples
>
> ScanVcfParam()
class: ScanVcfParam
vcfWhich: 0 elements
vcfFixed: character() [All]
vcfInfo:
vcfGeno:
vcfSamples:
>
> fl <- system.file("extdata", "structural.vcf", package="VariantAnnotation")
> compressVcf <- bgzip(fl, tempfile())
> idx <- indexTabix(compressVcf, "vcf")
> tab <- TabixFile(compressVcf, idx)
> ## ---------------------------------------------------------------------
> ## 'which' argument
> ## ---------------------------------------------------------------------
> ## To subset on genomic coordinates, supply an object
> ## containing the ranges of interest. These ranges can
> ## be given directly to the 'param' argument or wrapped
> ## inside ScanVcfParam() as the 'which' argument.
>
> ## When using a list, the outer list names must correspond to valid
> ## chromosome names in the vcf file. In this example they are "1"
> ## and "2".
> gr1 <- GRanges("1", IRanges(13219, 2827695, name="regionA"))
> gr2 <- GRanges(rep("2", 2), IRanges(c(321680, 14477080),
+ c(321689, 14477090), name=c("regionB", "regionC")))
> grl <- GRangesList("1"=gr1, "2"=gr2)
> vcf <- readVcf(tab, "hg19", grl)
>
> ## Names of the ranges are in the 'paramRangeID' metadata column of the
> ## GRanges object returned by the rowRanges() accessor.
> rowRanges(vcf)
GRanges object with 4 ranges and 5 metadata columns:
seqnames
<Rle>
1:13220_T/<DEL> 1
1:2827693_CCGTGGATGCGGGGACCCGCATCCCCTCTCCCTTCACAGCTGAGTGACCCACATCCCCTCTCCCCTCGCA/C 1
2:321682_T/<DEL> 2
2:14477084_C/<DEL:ME:ALU> 2
ranges
<IRanges>
1:13220_T/<DEL> [ 13220, 13220]
1:2827693_CCGTGGATGCGGGGACCCGCATCCCCTCTCCCTTCACAGCTGAGTGACCCACATCCCCTCTCCCCTCGCA/C [ 2827693, 2827762]
2:321682_T/<DEL> [ 321682, 321682]
2:14477084_C/<DEL:ME:ALU> [14477084, 14477084]
strand
<Rle>
1:13220_T/<DEL> *
1:2827693_CCGTGGATGCGGGGACCCGCATCCCCTCTCCCTTCACAGCTGAGTGACCCACATCCCCTCTCCCCTCGCA/C *
2:321682_T/<DEL> *
2:14477084_C/<DEL:ME:ALU> *
|
|
1:13220_T/<DEL> |
1:2827693_CCGTGGATGCGGGGACCCGCATCCCCTCTCCCTTCACAGCTGAGTGACCCACATCCCCTCTCCCCTCGCA/C |
2:321682_T/<DEL> |
2:14477084_C/<DEL:ME:ALU> |
paramRangeID
<factor>
1:13220_T/<DEL> regionA
1:2827693_CCGTGGATGCGGGGACCCGCATCCCCTCTCCCTTCACAGCTGAGTGACCCACATCCCCTCTCCCCTCGCA/C regionA
2:321682_T/<DEL> regionB
2:14477084_C/<DEL:ME:ALU> regionC
REF
<DNAStringSet>
1:13220_T/<DEL> T
1:2827693_CCGTGGATGCGGGGACCCGCATCCCCTCTCCCTTCACAGCTGAGTGACCCACATCCCCTCTCCCCTCGCA/C CCGTGGATGC...TCCCCTCGCA
2:321682_T/<DEL> T
2:14477084_C/<DEL:ME:ALU> C
ALT
<CharacterList>
1:13220_T/<DEL> <DEL>
1:2827693_CCGTGGATGCGGGGACCCGCATCCCCTCTCCCTTCACAGCTGAGTGACCCACATCCCCTCTCCCCTCGCA/C C
2:321682_T/<DEL> <DEL>
2:14477084_C/<DEL:ME:ALU> <DEL:ME:ALU>
QUAL
<numeric>
1:13220_T/<DEL> 6
1:2827693_CCGTGGATGCGGGGACCCGCATCCCCTCTCCCTTCACAGCTGAGTGACCCACATCCCCTCTCCCCTCGCA/C <NA>
2:321682_T/<DEL> 6
2:14477084_C/<DEL:ME:ALU> 12
FILTER
<character>
1:13220_T/<DEL> PASS
1:2827693_CCGTGGATGCGGGGACCCGCATCCCCTCTCCCTTCACAGCTGAGTGACCCACATCCCCTCTCCCCTCGCA/C PASS
2:321682_T/<DEL> PASS
2:14477084_C/<DEL:ME:ALU> PASS
-------
seqinfo: 2 sequences from hg19 genome; no seqlengths
>
> ## which can be used for subsetting the VCF object
> vcf[rowRanges(vcf)$paramRangeID == "regionA"]
class: CollapsedVCF
dim: 2 1
rowRanges(vcf):
GRanges with 5 metadata columns: paramRangeID, REF, ALT, QUAL, FILTER
info(vcf):
DataFrame with 10 columns: BKPTID, CIEND, CIPOS, END, HOMLEN, HOMSEQ, IMPR...
info(header(vcf)):
Number Type Description
BKPTID . String ID of the assembled alternate allele in the asse...
CIEND 2 Integer Confidence interval around END for imprecise var...
CIPOS 2 Integer Confidence interval around POS for imprecise var...
END 1 Integer End position of the variant described in this re...
HOMLEN . Integer Length of base pair identical micro-homology at ...
HOMSEQ . String Sequence of base pair identical micro-homology a...
IMPRECISE 0 Flag Imprecise structural variation
MEINFO 4 String Mobile element info of the form NAME,START,END,P...
SVLEN . Integer Difference in length between REF and ALT alleles
SVTYPE 1 String Type of structural variant
geno(vcf):
SimpleList of length 4: CN, CNQ, GQ, GT
geno(header(vcf)):
Number Type Description
CN 1 Integer Copy number genotype for imprecise events
CNQ 1 Float Copy number genotype quality for imprecise events
GQ 1 Float Genotype quality
GT 1 String Genotype
>
> ## When using ranges, the seqnames must correspond to valid
> ## chromosome names in the vcf file.
> gr <- unlist(grl, use.names=FALSE)
> vcf <- readVcf(tab, "hg19", gr)
>
> ## ---------------------------------------------------------------------
> ## 'fixed', 'info', 'geno' and 'samples' arguments
> ## ---------------------------------------------------------------------
> ## This param specifies the "GT" 'geno' field for a single sample
> ## and the subset of ranges in 'which'. All 'fixed' and 'info' fields
> ## will be returned.
> ScanVcfParam(geno="GT", samples="NA00002", which=gr)
class: ScanVcfParam
vcfWhich: 2 elements
vcfFixed: character() [All]
vcfInfo:
vcfGeno: GT
vcfSamples: NA00002
>
> ## Here two 'fixed' and one 'geno' field are specified
> ScanVcfParam(fixed=c("ALT", "QUAL"), geno="GT", info=NA)
class: ScanVcfParam
vcfWhich: 0 elements
vcfFixed: ALT, QUAL
vcfInfo: NA
vcfGeno: GT
vcfSamples:
>
> ## Return only the 'fixed' fields
> ScanVcfParam(geno=NA, info=NA)
class: ScanVcfParam
vcfWhich: 0 elements
vcfFixed: character() [All]
vcfInfo: NA
vcfGeno: NA
vcfSamples:
>
>
>
>
>
> dev.off()
null device
1
>