Last data update: 2014.03.03

R: Preparing the results of tallyBAM for writing to an HDF5...
applyTalliesR Documentation

Preparing the results of tallyBAM for writing to an HDF5 tally file

Description

This function tallies a set of bam files and prepares the data for writing to an HDF5 tally file.

Usage

applyTallies( bamfiles, chrom, start, stop, q=25, ncycles = 0, max.depth=1000000, prepForHDF5 = TRUE, reference = NULL)

Arguments

bamfiles

A character vector of filenames of the bam files that should be tallies. Note that for writing to an HDF5 file the order of this vector must match the order of the Column field in the sampledata object that corresponds to the dataset - see setSampleData for details.

prepForHDF5

Boolean flag to specify whether the data shall be structured for compatibility with the HDF5 tally file format. See the details section of this manual page.

reference

A DNAString object containing the reference sequence corresponding to the region that is described in the counts array – if this is NULL a consensus vote will be used to estimate the reference at any given position, this means you cannot detect variants with AF >= 0.5 anymore

chrom

Chromosome in which to tally

start

First position of the tally

stop

Last position of the tally

q

quality cut-off for considering a base call

ncycles

number of sequencing cycles form the front and back of the read that should be considered unreliable - used for stratifying the nucleotide counts

max.depth

only tally a position if there are less than this many reads overlapping it - can prevent long runtimes in unreliable regions

Details

This is a wrapper function for applying tallyBAM to a set of bam files specified in the bamfiles argument. If prepForHDF5 is not true the result is equivalent to calling tallyBAM with lapply on the file names, otherwise the resulting data structure has the same layout as the return value of h5readBlock and can be written to an HDF5 tally file directly. The order or samples along the sample dimension is the same as the order of the file names (i.e. the order of the bamfiles argument).

Value

A list with slots containing the Counts,Coverages,Deletions and Reference datasets for the given sample if prepForHDF5 is true, a list of 3D-arrays (Nucleotide x Strand x Position) otherwise.

Author(s)

Paul Pyl

Examples

library(h5vc)
library(BSgenome.Hsapiens.UCSC.hg19)
files <- c("NRAS.AML.bam","NRAS.Control.bam")
bamFiles <- file.path( system.file("extdata", package = "h5vcData"), files)
chrom = "1"
startpos <- 115247090
endpos <- 115259515
theData <- applyTallies( bamFiles, reference = Hsapiens[["chr1"]][startpos:endpos], chr = chrom, start = startpos, stop = endpos, ncycles = 10 )
str(theData)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(h5vc)
Loading required package: grid
Loading required package: gridExtra
Loading required package: ggplot2
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/h5vc/applyTallies.Rd_%03d_medium.png", width=480, height=480)
> ### Name: applyTallies
> ### Title: Preparing the results of tallyBAM for writing to an HDF5 tally
> ###   file
> ### Aliases: applyTallies
> 
> ### ** Examples
> 
> library(h5vc)
> library(BSgenome.Hsapiens.UCSC.hg19)
Loading required package: BSgenome
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following object is masked from 'package:gridExtra':

    combine

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: XVector
Loading required package: rtracklayer
> files <- c("NRAS.AML.bam","NRAS.Control.bam")
> bamFiles <- file.path( system.file("extdata", package = "h5vcData"), files)
> chrom = "1"
> startpos <- 115247090
> endpos <- 115259515
> theData <- applyTallies( bamFiles, reference = Hsapiens[["chr1"]][startpos:endpos], chr = chrom, start = startpos, stop = endpos, ncycles = 10 )
> str(theData)
List of 4
 $ Counts   : num [1:12, 1:2, 1, 1:2, 1:12426] 0 0 0 0 0 0 0 0 0 0 ...
  ..- attr(*, "dimnames")=List of 5
  .. ..$ : chr [1:12] "A.front" "C.front" "G.front" "T.front" ...
  .. ..$ : NULL
  .. ..$ : chr "/home/ddbj/local/lib64/R/library/h5vcData/extdata/NRAS.Control.bam"
  .. ..$ : chr [1:2] "+" "-"
  .. ..$ : NULL
 $ Coverages: num [1:2, 1, 1:2, 1:12426] 0 0 0 0 0 0 0 0 0 0 ...
  ..- attr(*, "dimnames")=List of 4
  .. ..$ : NULL
  .. ..$ : chr "/home/ddbj/local/lib64/R/library/h5vcData/extdata/NRAS.Control.bam"
  .. ..$ : chr [1:2] "+" "-"
  .. ..$ : NULL
 $ Deletions: num [1, 1:2, 1:2, 1:12426] 0 0 0 0 0 0 0 0 0 0 ...
  ..- attr(*, "dimnames")=List of 4
  .. ..$ : chr "/home/ddbj/local/lib64/R/library/h5vcData/extdata/NRAS.Control.bam"
  .. ..$ : NULL
  .. ..$ : chr [1:2] "+" "-"
  .. ..$ : NULL
 $ Reference: num [1:12426] 3 3 0 3 2 0 1 3 0 0 ...
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>