R: Estimate summaries of the distribution of fragment lengths in...
estimate.mean.fraglen
R Documentation
Estimate summaries of the distribution of fragment lengths in a
short-read experiment. The methods are designed for ChIP-Seq
experiments and may not work well in data without peaks.
Description
estimate.mean.fraglen implements three methods for estimating
mean fragment length. The other functions are related helper
functions implementing various methods, but may be useful by
themselves for diagnostic purposes. Many of these operations are
potentially slow.
sparse.density is intended to be similar to
density, but returns the results in a run-length encoded
form. This is useful when long stretches of the range of the data
have zero density.
For estimate.mean.fraglen, typically an
AlignedRead or a
GRanges object. Also supported
but deprecated, as they do not have formal strand information:
RangedData (with a "strand"
column), or a list-like object with elements "+" and "-"
representing locations of reads aligned to positive and negative
strands (the values should be integers denoting the location where
the first sequenced base matched.) Supported (but again, deprecated)
list types include: RangesList,
IntegerList or an ordinary R
list.
For basesCovered and densityCorr, a list with elements
"+" and "-" representing locations of reads aligned to
positive and negative strands (the values should be integers denoting
the location where the first sequenced base
matched.) densityCorr has also come to support GRanges
input directly.
For sparse.density, a numeric or integer vector for which
density is to be computed.
method
Character string giving method to be used.
method = "SISSR" implements the method described in Jothi et
al (see References below). method = "correlation" implements
the method described in Kharchenko et al (see References below),
where the idea is to compute the density of tag start positions
separately for each strand, and then determine the amount of shift
that maximizes the correlation between these two densities.
method = "coverage" computes the optimal shift for which the
number of bases covered by any read is minimized.
shift
Integer vector giving amount of shifts to be tried when
optimizing. The current algorithm simply evaluates all supplied
values and reports the one giving minimum coverage or maximum
correlation.
seqLen
For the "coverage" method, the assumed length of
each read for computing the coverage. Typically the read
length. This is added to the shift estimated by "coverage"
and "correlation" to come up with the actual fragment
length.
verbose
Logical specifying whether progress information should
be printed during execution.
center
For the "correlation" method, whether the
calculations should incorporate centering by the mean density. The
default is not to do so; as the density is zero over most of the
genome, this slightly improves efficiency at negligible loss in
accuracy.
width
half-bandwidth used in the computation. This needs to
be specified as an integer, data-driven rules are not supported.
kernel
A character string giving the density kernel.
from, to
specifies range over which the density is to be
computed.
maxDist
If distance to nearest neighbor is more than this, the
position is discarded. This removes isolated points, which are not
very informative.
...
Extra arguments, passed on as appropriate to other
functions.
Details
For the correlation method, the range over which densities are
computed only cover the range of reads; that is, the beginning and end
of chromosomes are excluded.
Value
estimate.mean.fraglen gives an estimate of the mean fragment
length.
basesCovered and densityCorr give a vector of the
corresponding objective function evaluated at the supplied values of
shift.
sparse.density returns an object of class "Rle".
Author(s)
Deepayan Sarkar, Michael Lawrence
References
R. Jothi, S. Cuddapah, A. Barski, K. Cui, and K. Zhao. Genome-wide
identification of in vivo protein-DNA binding sites from ChIP-Seq
data. Nucleic Acids Research, 36:5221–31, 2008.
P. V. Kharchenko, M. Y. Tolstorukov, and P. J. Park. Design and
analysis of ChIP experiments for DNA-binding proteins. Nature
Biotechnology, 26:1351–1359, 2008.
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(chipseq)
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: ShortRead
Loading required package: BiocParallel
Loading required package: Biostrings
Loading required package: XVector
Loading required package: Rsamtools
Loading required package: GenomicAlignments
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/chipseq/estimate.mean.fraglen.Rd_%03d_medium.png", width=480, height=480)
> ### Name: estimate.mean.fraglen
> ### Title: Estimate summaries of the distribution of fragment lengths in a
> ### short-read experiment. The methods are designed for ChIP-Seq
> ### experiments and may not work well in data without peaks.
> ### Aliases: estimate.mean.fraglen estimate.mean.fraglen,AlignedRead-method
> ### estimate.mean.fraglen,GRanges-method basesCovered densityCorr
> ### densityCorr,list densityCorr,GenomicRanges sparse.density
> ### Keywords: univar
>
> ### ** Examples
>
> data(cstest)
> estimate.mean.fraglen(cstest[["ctcf"]], method = "coverage")
chr10 chr11 chr12
150 140 160
>
>
>
>
>
> dev.off()
null device
1
>