Last data update: 2014.03.03

R: Identify all substitutions observed across genomic positions...
getAllSubR Documentation

Identify all substitutions observed across genomic positions exhibiting a specified minimum coverage

Description

All substitutions observed across genomic positions exhibiting user-defined minimum coverage are extracted and a count table is returned. This function supports parallel computing.

Usage

getAllSub(sortedBam, minCov = 20, cores = 1)

Arguments

sortedBam

GRanges object containing aligned reads as returned by readSortedBam

minCov

An integer defining the minimum coverage required at a genomic position exhibiting a substitution. Genomic positions of coverage less than minCov are discarded. Default is 20 (see Details).

cores

An integer defining the number of cores to be used for parallel processing, if available. Default is 1.

Details

The choice of the minimum coverage influences the variance of the relative substitution frequency estimates, which in turn affect the mixture model fit. A conservative value depending on the library size is recommended for a first analysis. Values smaller than 10 have not been tested and are therefore not recommended.

Value

A GRanges object containing a count table, where each range correspond to a substitution. The metadata correspond to the following information:

substitutions

observed substitution, e.g. AT, i.e. A in the reference sequence and T in the mapped read.

coverage

strand-specific coverage.

count

number of strand-specific substitutions.

Author(s)

Federico Comoglio and Cem Sievers, with contributions from Martin Morgan

See Also

readSortedBam

Examples


filename <- system.file( "extdata", "example.bam", package = "wavClusteR" )
example <- readSortedBam(filename = filename)
countTable <- getAllSub( example, minCov = 10, cores = 1 )
countTable

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(wavClusteR)
Loading required package: GenomicRanges
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: Rsamtools
Loading required package: Biostrings
Loading required package: XVector
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/wavClusteR/getAllSub.Rd_%03d_medium.png", width=480, height=480)
> ### Name: getAllSub
> ### Title: Identify all substitutions observed across genomic positions
> ###   exhibiting a specified minimum coverage
> ### Aliases: getAllSub
> ### Keywords: core
> 
> ### ** Examples
> 
> 
> filename <- system.file( "extdata", "example.bam", package = "wavClusteR" )
> example <- readSortedBam(filename = filename)
> countTable <- getAllSub( example, minCov = 10, cores = 1 )
Loading required package: doMC
Loading required package: foreach
Loading required package: iterators
Considering substitutions, n = 497, processing in 1 chunks
   chunk #: 1
   considering the + strand
Computing local coverage at substitutions...
   considering the - strand
Computing local coverage at substitutions...
> countTable
GRanges object with 478 ranges and 3 metadata columns:
        seqnames               ranges strand | substitutions  coverage
           <Rle>            <IRanges>  <Rle> |   <character> <numeric>
    [1]     chrX [24001959, 24001959]      - |            TC        17
    [2]     chrX [24001973, 24001973]      - |            TC        17
    [3]     chrX [24001977, 24001977]      - |            TC        13
    [4]     chrX [24002046, 24002046]      - |            TC        10
    [5]     chrX [24002057, 24002057]      - |            TC        10
    ...      ...                  ...    ... .           ...       ...
  [474]     chrX [24007076, 24007076]      - |            TC        17
  [475]     chrX [24007077, 24007077]      - |            TC        17
  [476]     chrX [24007078, 24007078]      - |            TC        17
  [477]     chrX [24023020, 24023020]      - |            AG        23
  [478]     chrX [24023028, 24023028]      - |            TC        23
            count
        <integer>
    [1]         2
    [2]        12
    [3]         1
    [4]         1
    [5]         6
    ...       ...
  [474]         8
  [475]         4
  [476]         1
  [477]         1
  [478]        23
  -------
  seqinfo: 1 sequence from an unspecified genome; no seqlengths
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>