Last data update: 2014.03.03

R: Make variable-width bins
variableWidthBinsR Documentation

Make variable-width bins

Description

Make variable-width bins based on a reference BAM file. This can be a simulated file (produced by simulateReads and aligned with your favourite aligner) or a real reference.

Usage

variableWidthBins(reads, binsizes, chromosomes = NULL)

Arguments

reads

A GRanges with reads. See bam2GRanges and bed2GRanges.

binsizes

A vector with binsizes. Resulting bins will be close to the specified binsizes.

chromosomes

A subset of chromosomes for which the bins are generated.

Details

Variable-width bins are produced by first binning the reference BAM file with fixed-width bins and selecting the desired number of reads per bin as the (non-zero) maximum of the histogram. A new set of bins is then generated such that every bin contains the desired number of reads.

Value

A list() of GRanges objects with variable-width bins.

Author(s)

Aaron Taudt

Examples

## Get an example BED file with single-cell-sequencing reads
bedfile <- system.file("extdata", "KK150311_VI_07.bam.bed.gz", package="AneuFinderData")
## Read the file into a GRanges object
reads <- bed2GRanges(bedfile, assembly='mm10', chromosomes=c(1:19,'X','Y'),
                    min.mapq=10, remove.duplicate.reads=TRUE)
## Make variable-width bins of size 500kb and 1Mb
bins <- variableWidthBins(reads, binsizes=c(5e5,1e6))
## Plot the distribution of binsizes
hist(width(bins[['1e+06']]), breaks=50)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(AneuFinder)
Loading required package: GenomicRanges
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: cowplot
Loading required package: ggplot2

Attaching package: 'cowplot'

The following object is masked from 'package:ggplot2':

    ggsave

Loading required package: AneuFinderData
Loading AneuFinder
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/AneuFinder/variableWidthBins.Rd_%03d_medium.png", width=480, height=480)
> ### Name: variableWidthBins
> ### Title: Make variable-width bins
> ### Aliases: variableWidthBins
> 
> ### ** Examples
> 
> ## Get an example BED file with single-cell-sequencing reads
> bedfile <- system.file("extdata", "KK150311_VI_07.bam.bed.gz", package="AneuFinderData")
> ## Read the file into a GRanges object
> reads <- bed2GRanges(bedfile, assembly='mm10', chromosomes=c(1:19,'X','Y'),
+                     min.mapq=10, remove.duplicate.reads=TRUE)
Reading file KK150311_VI_07.bam.bed.gz ... 3.04s
Fetching chromosome lengths from UCSC ... 0.97s
Filtering reads ... 0.42s
> ## Make variable-width bins of size 500kb and 1Mb
> bins <- variableWidthBins(reads, binsizes=c(5e5,1e6))
Binning reads in fixed-width windows ... 3.32s
Making variable-width windows for bin size 5e+05 ... 1.15s
Making variable-width windows for bin size 1e+06 ... 0.96s
Warning messages:
1: In variableWidthBins(reads, binsizes = c(5e+05, 1e+06)) :
  The following chromosomes were skipped because they are smaller than binsize 5e+05: Y
2: In variableWidthBins(reads, binsizes = c(5e+05, 1e+06)) :
  The following chromosomes were skipped because they are smaller than binsize 1e+06: Y
> ## Plot the distribution of binsizes
> hist(width(bins[['1e+06']]), breaks=50)
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>