R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Compute fragment start and fragment length distributions

getDistrs

R Documentation

Compute fragment start and fragment length distributions

Description

Compute fragment start distributions by using reads aligned to genes with only one annotated variant. Estimate fragment length distribution using fragments aligned to long exons (>1000nt). Fragment length is defined as the distance between the start of the left-end read and the end of the right-end read.

Usage

getDistrs(DB, bam, pbam, islandid=NULL, verbose=FALSE, nreads=4*10^6,
readLength, min.gt.freq = NULL, tgroups=5, mc.cores=1)

Arguments

`DB`	Annotated genome. Object of class `knownGenome` as returned by `procGenome`.
`bam`	Aligned reads, as returned by `scanBam`. It must be a list with elements 'qname', 'rname', 'pos' and 'mpos'. Ignored when argument `pbam` is specified.
`pbam`	Processed BAM object of class `procBam`, as returned by function `procBam`. Arguments `bam` and `readLength` are ignored when `pbam` is specified.
`islandid`	Island IDs of islands to be used in the read start distribution calculations (defaults to genes with only one annotated variant)
`verbose`	Set to `TRUE` to print progress information.
`nreads`	To speed up computations, only the first `nreads` are used to obtain the estimates. The default value of 4 milions usually gives highly precise estimates.
`readLength`	Read length in bp, e.g. in a paired-end experiment where 75bp are sequenced on each end one would set `readLength=75`.
`min.gt.freq`	The target distributions cannot be estimated with precision for gene types that are very unfrequent. Gene types with relative frequency below `min.gt.freq` are merged, e.g. `min.gt.freq=0.05` means gene types making up for 5% of the genes in DB will be combined and a single read start and length distribution will be estimated for all of them.
`tgroups`	As an alternative to `min.gt.freq` you may specify the maximum number of distinct gene types to consider. A separate estimate will be obtained for the `tgroups` with highest frequency, all others will be combined.
`mc.cores`	Number of cores to use for parallel processing

Value

An object of class readDistrs with slots:

`lenDis`	Table with number of fragments with a given length
`stDis`	Cumulative distribution function (object of type closure) for relative start position

Author(s)

Camille Stephan-Otto Attolini, David Rossell

Examples

data(K562.r1l1)
data(hg19DB)
bam0 <- rmShortInserts(K562.r1l1, isizeMin=100)

distrs <- getDistrs(hg19DB,bam=bam0,readLength=75)

#Fragment length distribution
plot(distrs,'fragLength')

#Fragment start distribution (relative to transcript length)
plot(distrs,'readSt')

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(casper)
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: IRanges
Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/casper/getDistrs.Rd_%03d_medium.png", width=480, height=480)
> ### Name: getDistrs
> ### Title: Compute fragment start and fragment length distributions
> ### Aliases: getDistrs
> ### Keywords: stats
> 
> ### ** Examples
> 
> data(K562.r1l1)
> data(hg19DB)
> bam0 <- rmShortInserts(K562.r1l1, isizeMin=100)
> 
> distrs <- getDistrs(hg19DB,bam=bam0,readLength=75)
> 
> #Fragment length distribution
> plot(distrs,'fragLength')
> 
> #Fragment start distribution (relative to transcript length)
> plot(distrs,'readSt')
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>