R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Estimate summaries of the distribution of fragment lengths in...

estimate.mean.fraglen

R Documentation

Estimate summaries of the distribution of fragment lengths in a short-read experiment. The methods are designed for ChIP-Seq experiments and may not work well in data without peaks.

Description

estimate.mean.fraglen implements three methods for estimating mean fragment length. The other functions are related helper functions implementing various methods, but may be useful by themselves for diagnostic purposes. Many of these operations are potentially slow.

sparse.density is intended to be similar to density, but returns the results in a run-length encoded form. This is useful when long stretches of the range of the data have zero density.

Usage

estimate.mean.fraglen(x, method = c("SISSR", "coverage", "correlation"),
                      ...)

basesCovered(x, shift = seq(5, 300, 5), seqLen = 100, verbose = FALSE)

densityCorr(x, shift = seq(0, 500, 5), center = FALSE,
            width = seqLen *2L, seqLen=100L, maxDist = 500L, ...)

sparse.density(x, width = 50, kernel = "epanechnikov",
               from = start(rix)[1] - 10L,
               to = end(rix)[length(rix)] + 10L)

Arguments

`x`	For `estimate.mean.fraglen`, typically an AlignedRead or a GRanges object. Also supported but deprecated, as they do not have formal strand information: RangedData (with a "strand" column), or a list-like object with elements `"+"` and `"-"` representing locations of reads aligned to positive and negative strands (the values should be integers denoting the location where the first sequenced base matched.) Supported (but again, deprecated) list types include: RangesList, IntegerList or an ordinary R list. For `basesCovered` and `densityCorr`, a list with elements `"+"` and `"-"` representing locations of reads aligned to positive and negative strands (the values should be integers denoting the location where the first sequenced base matched.) `densityCorr` has also come to support `GRanges` input directly. For `sparse.density`, a numeric or integer vector for which density is to be computed.
`method`	Character string giving method to be used. `method = "SISSR"` implements the method described in Jothi et al (see References below). `method = "correlation"` implements the method described in Kharchenko et al (see References below), where the idea is to compute the density of tag start positions separately for each strand, and then determine the amount of shift that maximizes the correlation between these two densities. `method = "coverage"` computes the optimal shift for which the number of bases covered by any read is minimized.
`shift`	Integer vector giving amount of shifts to be tried when optimizing. The current algorithm simply evaluates all supplied values and reports the one giving minimum coverage or maximum correlation.
`seqLen`	For the `"coverage"` method, the assumed length of each read for computing the coverage. Typically the read length. This is added to the shift estimated by `"coverage"` and `"correlation"` to come up with the actual fragment length.
`verbose`	Logical specifying whether progress information should be printed during execution.
`center`	For the `"correlation"` method, whether the calculations should incorporate centering by the mean density. The default is not to do so; as the density is zero over most of the genome, this slightly improves efficiency at negligible loss in accuracy.
`width`	half-bandwidth used in the computation. This needs to be specified as an integer, data-driven rules are not supported.
`kernel`	A character string giving the density kernel.
`from, to`	specifies range over which the density is to be computed.
`maxDist`	If distance to nearest neighbor is more than this, the position is discarded. This removes isolated points, which are not very informative.
`...`	Extra arguments, passed on as appropriate to other functions.

Details

For the correlation method, the range over which densities are computed only cover the range of reads; that is, the beginning and end of chromosomes are excluded.

Value

estimate.mean.fraglen gives an estimate of the mean fragment length.

basesCovered and densityCorr give a vector of the corresponding objective function evaluated at the supplied values of shift.

sparse.density returns an object of class "Rle".

Author(s)

Deepayan Sarkar, Michael Lawrence

References

R. Jothi, S. Cuddapah, A. Barski, K. Cui, and K. Zhao. Genome-wide identification of in vivo protein-DNA binding sites from ChIP-Seq data. Nucleic Acids Research, 36:5221–31, 2008.

P. V. Kharchenko, M. Y. Tolstorukov, and P. J. Park. Design and analysis of ChIP experiments for DNA-binding proteins. Nature Biotechnology, 26:1351–1359, 2008.

Examples

data(cstest)
estimate.mean.fraglen(cstest[["ctcf"]], method = "coverage")

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(chipseq)
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: ShortRead
Loading required package: BiocParallel
Loading required package: Biostrings
Loading required package: XVector
Loading required package: Rsamtools
Loading required package: GenomicAlignments
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/chipseq/estimate.mean.fraglen.Rd_%03d_medium.png", width=480, height=480)
> ### Name: estimate.mean.fraglen
> ### Title: Estimate summaries of the distribution of fragment lengths in a
> ###   short-read experiment.  The methods are designed for ChIP-Seq
> ###   experiments and may not work well in data without peaks.
> ### Aliases: estimate.mean.fraglen estimate.mean.fraglen,AlignedRead-method
> ###   estimate.mean.fraglen,GRanges-method basesCovered densityCorr
> ###   densityCorr,list densityCorr,GenomicRanges sparse.density
> ### Keywords: univar
> 
> ### ** Examples
> 
> data(cstest)
> estimate.mean.fraglen(cstest[["ctcf"]], method = "coverage")
chr10 chr11 chr12 
  150   140   160 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>