Last data update: 2014.03.03

R: User Constructor for class. Calls VCFData constructor:...
ReadVCFDataChunkR Documentation

User Constructor for class. Calls VCFData constructor: ReadVCFDataChunk is a wrapper for readVcfAsVRanges. It removes indels, GL chromosomes, and MULTI calls. It scans the header of the vcf file and adds in the following fields for analysis if present: AD, GT, DP, GQ. Looks for the "END" tag in the header and reads in file as gVCF if necessary. This is a multi core version of readVCFData. Note, input file must have been zipped and have a corresponding tabix file. It will drop all hom ref sites not in the admixture file but retain the counts of homref and multi in the VCF file. This means that a few of the metrics and the hom ref plot can no longer be calculated in VCFQAReport. If the metrics can no longer be calculated, it will not be output. Please note that if using a filter on the data (eg gq.filter) this will not be applied to the hom ref and total number of calls. The filter is applied in the VCFQAReport step and the metrics number of hom ref and total number of calls is calculated while reading in the file. When calling this function keep in mind the memory requirements. For example, if numcores=6, then when submitting the job you may request 12 Gb each core (72 Gb total). However the VCF in memory will need to fit back onto a single core or else R will not be able to allocate the memory. The given example here does not make sense to run as it includes only chromosome 22.

Description

User Constructor for class. Calls VCFData constructor: ReadVCFDataChunk is a wrapper for readVcfAsVRanges. It removes indels, GL chromosomes, and MULTI calls. It scans the header of the vcf file and adds in the following fields for analysis if present: AD, GT, DP, GQ. Looks for the "END" tag in the header and reads in file as gVCF if necessary. This is a multi core version of readVCFData. Note, input file must have been zipped and have a corresponding tabix file. It will drop all hom ref sites not in the admixture file but retain the counts of homref and multi in the VCF file. This means that a few of the metrics and the hom ref plot can no longer be calculated in VCFQAReport. If the metrics can no longer be calculated, it will not be output. Please note that if using a filter on the data (eg gq.filter) this will not be applied to the hom ref and total number of calls. The filter is applied in the VCFQAReport step and the metrics number of hom ref and total number of calls is calculated while reading in the file. When calling this function keep in mind the memory requirements. For example, if numcores=6, then when submitting the job you may request 12 Gb each core (72 Gb total). However the VCF in memory will need to fit back onto a single core or else R will not be able to allocate the memory. The given example here does not make sense to run as it includes only chromosome 22.

Usage

ReadVCFDataChunk(mydir, myfile, genome, admixture.ref, numcores)

Arguments

mydir

Directory of vcf file

myfile

Filename of vcf file (zipped)

genome

GRCh37 or GRCh38

admixture.ref

VRanges with MAF for superpopulations (EAS, AFR, EUR)

numcores

Number of cores to read in VCF (passed to bplapply)

Value

Object of type VCFData

Examples

vcffn <- system.file("ext-data", "chr22.GRCh38.vcf.gz", package="genotypeeval")
mydir <- paste(dirname(vcffn), "/", sep="")
myfile <-basename(vcffn)
svp <- ScanVcfParam(which=GRanges("22", IRanges(0,1e5)), geno="GT")
vcf <- ReadVCFData(mydir, myfile, "GRCh38")
admix.var <- getVR(vcf)[getVR(vcf)$GT %in% c("0|1", "1|0", "1|1"),][,1:2]
admix.var$EAS_AF <- ifelse(admix.var$GT %in% c("1|1"), 1, .5)
admix.var$AFR_AF<- 0
admix.var$EUR_AF<- 0
admix.hom <- getVR(vcf)[getVR(vcf)$GT %in% c("0|0"),][,1:2]
admix.hom$EAS_AF<- 0
admix.hom$AFR_AF<- 1
admix.hom$EUR_AF<- 1
admix.ref <- c(admix.var, admix.hom)
ReadVCFDataChunk(mydir, myfile, "GRCh38", admix.ref, numcores=2)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(genotypeeval)
Loading required package: VariantAnnotation
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: GenomeInfoDb
Loading required package: stats4
Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: Rsamtools
Loading required package: Biostrings
Loading required package: XVector

Attaching package: 'VariantAnnotation'

The following object is masked from 'package:base':

    tabulate

Warning message:
replacing previous import 'BiocGenerics::Position' by 'ggplot2::Position' when loading 'genotypeeval' 
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/genotypeeval/ReadVCFDataChunk.Rd_%03d_medium.png", width=480, height=480)
> ### Name: ReadVCFDataChunk
> ### Title: User Constructor for class.  Calls VCFData constructor:
> ###   ReadVCFDataChunk is a wrapper for readVcfAsVRanges.  It removes
> ###   indels, GL chromosomes, and MULTI calls. It scans the header of the
> ###   vcf file and adds in the following fields for analysis if present:
> ###   AD, GT, DP, GQ. Looks for the "END" tag in the header and reads in
> ###   file as gVCF if necessary. This is a multi core version of
> ###   readVCFData.  Note, input file must have been zipped and have a
> ###   corresponding tabix file.  It will drop all hom ref sites not in the
> ###   admixture file but retain the counts of homref and multi in the VCF
> ###   file.  This means that a few of the metrics and the hom ref plot can
> ###   no longer be calculated in VCFQAReport.  If the metrics can no longer
> ###   be calculated, it will not be output.  Please note that if using a
> ###   filter on the data (eg gq.filter) this will not be applied to the hom
> ###   ref and total number of calls.  The filter is applied in the
> ###   VCFQAReport step and the metrics number of hom ref and total number
> ###   of calls is calculated while reading in the file.  When calling this
> ###   function keep in mind the memory requirements.  For example, if
> ###   numcores=6, then when submitting the job you may request 12 Gb each
> ###   core (72 Gb total).  However the VCF in memory will need to fit back
> ###   onto a single core or else R will not be able to allocate the memory.
> ###   The given example here does not make sense to run as it includes only
> ###   chromosome 22.
> ### Aliases: ReadVCFDataChunk
> 
> ### ** Examples
> 
> vcffn <- system.file("ext-data", "chr22.GRCh38.vcf.gz", package="genotypeeval")
> mydir <- paste(dirname(vcffn), "/", sep="")
> myfile <-basename(vcffn)
> svp <- ScanVcfParam(which=GRanges("22", IRanges(0,1e5)), geno="GT")
> vcf <- ReadVCFData(mydir, myfile, "GRCh38")
Reading VCF ... 
Warning message:
In keepSeqlevels(.Object@vr, reg.chrs) :
  invalid seqlevels'1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', 'X', 'Y'were ignored
> admix.var <- getVR(vcf)[getVR(vcf)$GT %in% c("0|1", "1|0", "1|1"),][,1:2]
> admix.var$EAS_AF <- ifelse(admix.var$GT %in% c("1|1"), 1, .5)
> admix.var$AFR_AF<- 0
> admix.var$EUR_AF<- 0
> admix.hom <- getVR(vcf)[getVR(vcf)$GT %in% c("0|0"),][,1:2]
> admix.hom$EAS_AF<- 0
> admix.hom$AFR_AF<- 1
> admix.hom$EUR_AF<- 1
> admix.ref <- c(admix.var, admix.hom)
> ReadVCFDataChunk(mydir, myfile, "GRCh38", admix.ref, numcores=2)
Reading VCF ... 
An object of class "VCFData"
Slot "mydir":
[1] "/home/ddbj/local/lib64/R/library/genotypeeval/ext-data/"

Slot "myfile":
[1] "chr22.GRCh38.vcf.gz"

Slot "genome":
[1] "GRCh38"

Slot "vr":
VRanges object with 1243 ranges and 9 metadata columns:
         seqnames               ranges strand         ref              alt
            <Rle>            <IRanges>  <Rle> <character> <characterOrRle>
     [1]       22 [49932468, 49932468]      *           C                T
     [2]       22 [49938676, 49938676]      *           T                C
     [3]       22 [49943113, 49943113]      *           G                A
     [4]       22 [49952081, 49952081]      *           T                C
     [5]       22 [49954279, 49954279]      *           G                C
     ...      ...                  ...    ...         ...              ...
  [1239]       22 [50546170, 50546170]      *           G                T
  [1240]       22 [50546279, 50546279]      *           C                G
  [1241]       22 [50547446, 50547446]      *           T                C
  [1242]       22 [50552374, 50552374]      *           C                T
  [1243]       22 [50561252, 50561252]      *           A                G
             totalDepth       refDepth       altDepth   sampleNames
         <integerOrRle> <integerOrRle> <integerOrRle> <factorOrRle>
     [1]             32           <NA>           <NA>       HG00096
     [2]             28           <NA>           <NA>       HG00096
     [3]             22           <NA>           <NA>       HG00096
     [4]             20           <NA>           <NA>       HG00096
     [5]             35           <NA>           <NA>       HG00096
     ...            ...            ...            ...           ...
  [1239]             27           <NA>           <NA>       HG00101
  [1240]             36           <NA>           <NA>       HG00101
  [1241]             35           <NA>           <NA>       HG00101
  [1242]             31           <NA>           <NA>       HG00101
  [1243]             21           <NA>           <NA>       HG00101
         softFilterMatrix |      QUAL          GT        GQ   variant   var.bin
                 <matrix> | <numeric> <character> <numeric> <numeric> <numeric>
     [1]                  |       100         1|0        99         1         1
     [2]                  |       100         0|0        99         0         0
     [3]                  |       100         0|1        99         1         1
     [4]                  |       100         0|0        99         0         0
     [5]                  |       100         0|0        99         0         0
     ...              ... .       ...         ...       ...       ...       ...
  [1239]                  |       100         0|1        99         1         1
  [1240]                  |       100         0|1        99         1         1
  [1241]                  |       100         0|1        99         1         1
  [1242]                  |       100         0|1        99         1         1
  [1243]                  |       100         0|1        99         1         1
                            key     admix  n.homref     n.dup
                    <character> <logical> <integer> <integer>
     [1] 22:49932468:49932468:1      TRUE       348     38005
     [2] 22:49938676:49938676:0      TRUE       348     38005
     [3] 22:49943113:49943113:1      TRUE       348     38005
     [4] 22:49952081:49952081:0      TRUE       348     38005
     [5] 22:49954279:49954279:0      TRUE       348     38005
     ...                    ...       ...       ...       ...
  [1239] 22:50546170:50546170:1      TRUE       348     38005
  [1240] 22:50546279:50546279:1      TRUE       348     38005
  [1241] 22:50547446:50547446:1      TRUE       348     38005
  [1242] 22:50552374:50552374:1      TRUE       348     38005
  [1243] 22:50561252:50561252:1      TRUE       348     38005
  -------
  seqinfo: 1 sequence from GRCh38 genome; no seqlengths
  hardFilters: NULL

Slot "genoString":
[1] "GT" "DP" "GQ" "AD"

Slot "infoString":
[1] NA

Slot "n.dup":
[1] 38005

Slot "chunked":
[1] TRUE

Slot "n.homref":
[1] 348

Warning message:
In keepSeqlevels(vr, reg.chrs) :
  invalid seqlevels'1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11', '12', '13', '14', '15', '16', '17', '18', '19', '20', '21', 'X', 'Y'were ignored
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>