R: Segmentation of the genome based on multiple samples of...
segmentSeq-package
R Documentation
Segmentation of the genome based on multiple samples of high-throughput
sequencing data.
Description
The segmentSeq package is intended to take multiple samples of
high-throughput data (together with replicate information) and identify
regions of the genome which have a (reproducibly) high density of tags
aligning to them. The package was developed for use in identifying small
RNA precursors from small RNA sequencing data, but may also be useful in
some mRNA-Seq and chIP-Seq applications.
Details
Package:
segmentSeq
Type:
Package
Version:
0.0.2
Date:
2010-01-20
License:
GPL-3
LazyLoad:
yes
Depends:
baySeq, ShortRead
To use the package, we construct an alignmentData object
from sets of alignment files using either the readGeneric
function to read text files or the readBAM function to
read from BAM format files.
We then use the processAD function to identify all
potential subsegments of the data and the number of tags that align to
these subsegments. We then use either a heuristic or empirical Bayesian
approach to segment the genome into ‘loci’ and ‘null’ regions. We can then
acquire posterior likelihoods for each set of replicates which tell us
whether a region is likely to be a locus or a null in that replicate group.
The segmentation is designed to be usable by the
baySeq package to allow
differential expression analyses to be carried out on the discovered loci.
The package (optionally) makes use of the 'snow' package for
parallelisation of computationally intensive functions. This is highly
recommended for large data sets.
See the vignette for more details.
Author(s)
Thomas J. Hardcastle
Maintainer: Thomas J. Hardcastle <tjh48@cam.ac.uk>
References
Hardcastle T.J., Kelly, K.A. and Balcombe D.C. (2011). Identifying small
RNA loci from high-throughput sequencing data. In press.
See Also
baySeq
Examples
# Define the chromosome lengths for the genome of interest.
chrlens <- c(2e6, 1e6)
# Define the files containing sample information.
datadir <- system.file("extdata", package = "segmentSeq")
libfiles <- c("SL9.txt", "SL10.txt", "SL26.txt", "SL32.txt")
# Establish the library names and replicate structure.
libnames <- c("SL9", "SL10", "SL26", "SL32")
replicates <- c(1,1,2,2)
# Process the files to produce an 'alignmentData' object.
alignData <- readGeneric(file = libfiles, dir = datadir, replicates =
replicates, libnames = libnames, chrs = c(">Chr1", ">Chr2"), chrlens =
chrlens)
# Process the alignmentData object to produce a 'segData' object.
sD <- processAD(alignData, gap = 100, cl = NULL)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(segmentSeq)
Loading required package: baySeq
Loading required package: GenomicRanges
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: abind
Loading required package: perm
Loading required package: ShortRead
Loading required package: BiocParallel
Loading required package: Biostrings
Loading required package: XVector
Loading required package: Rsamtools
Loading required package: GenomicAlignments
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/segmentSeq/segmentSeq-package.Rd_%03d_medium.png", width=480, height=480)
> ### Name: segmentSeq-package
> ### Title: Segmentation of the genome based on multiple samples of
> ### high-throughput sequencing data.
> ### Aliases: segmentSeq-package segmentSeq
> ### Keywords: package
>
> ### ** Examples
>
>
> # Define the chromosome lengths for the genome of interest.
>
> chrlens <- c(2e6, 1e6)
>
> # Define the files containing sample information.
>
> datadir <- system.file("extdata", package = "segmentSeq")
> libfiles <- c("SL9.txt", "SL10.txt", "SL26.txt", "SL32.txt")
>
> # Establish the library names and replicate structure.
>
> libnames <- c("SL9", "SL10", "SL26", "SL32")
> replicates <- c(1,1,2,2)
>
> # Process the files to produce an 'alignmentData' object.
>
> alignData <- readGeneric(file = libfiles, dir = datadir, replicates =
+ replicates, libnames = libnames, chrs = c(">Chr1", ">Chr2"), chrlens =
+ chrlens)
Reading files........done!
Analysing tags...........done!
>
> # Process the alignmentData object to produce a 'segData' object.
>
> sD <- processAD(alignData, gap = 100, cl = NULL)
Chromosome: >Chr1
Finding start-stop co-ordinates...done!
1292 candidate loci found.
Chromosome: >Chr2
Finding start-stop co-ordinates...done!
13099 candidate loci found.
>
>
>
>
>
>
> dev.off()
null device
1
>