Last data update: 2014.03.03

R: Process reads from High-Troughtput Sequencing experiments
processReadsR Documentation

Process reads from High-Troughtput Sequencing experiments

Description

This method allows the processment of NGS nucleosome reads from different sources and a basic manipulation of them. The tasks includes the correction of strand-specific single-end reads and the trimming of reads to a given length.

Usage

## S4 method for signature 'AlignedRead'
processReads(data, type = "single", fragmentLen, trim, ...)
## S4 method for signature 'RangedData'
processReads(data, type = "single", fragmentLen, trim, ...)

Arguments

data

Sequence reads objects, probably imported using other packages as ShortRead. Allowed object types are AlignedRead and RangedData with a strand attribute.

type

Describes the type of reads. Values allowed are single for single-ended reads and paired for paired-ended.

fragmentLen

Expected original length of the sequenced fragments. See details.

trim

Length to trim the reads (or extend them if trim > read length)

...

Other parameters passed to fragmentLenDetect if no fixed fragmentLen is given.

Details

This function reads a AlignedRead or a RangedData object containing the position, length and strand of the sequence reads.

It allows the processment of both paired and single ended reads. In the case of single end reads this function corrects the strand-specific mapping by shifting plus strand reads and minus strand reads towards a middle position where both strands are overlaped. This is done by accounting the expected fragment length (fragmentLen).

For paired end reads, mononucleosomal reads could extend more than expected length due to mapping issues or experimental conditions. In this case, the fragmentLen variable sets the threshold from which reads longer than it should be ignored.

If no value is supplied for fragmentLen it will be calculated automatically (increasing the computing time) using fragmentLenDetect with default parameters. Performance can be increased by tunning fragmentLenDetect parameteres in a separated call and passing its result as fragmentLen parameter.

In some cases, could be useful trim the reads to a shorter length to improve the detection of nucleosome dyads, easing its detection and automatic positioning. The parameter trim allows the selection of how many nucleotides select from each read.

A special case for single-ended data is setting the trim to the same value as fragmentLen, so the reads will be extended strand-wise towards the 3' direction, creating an artificial map comparable with paired-ended data. The same but opposite can be performed with paired-end data, setting a trim value equal to the read length from paired ended, so paired-ended data will look like single-ended..

Value

RangedData containing the aligned/trimmed individual reads

Note

IMPORTANT: this information is only used to correct possible strand-specific mapping, this package doesn't link the two ends of paired reads.

Author(s)

Oscar Flores oflores@mmb.pcb.ub.es

See Also

AlignedRead, RangedData, fragmentLenDetect

Examples

	
	#Load data	
	data(nucleosome_htseq)
	
	#Process nucleosome reads, select only those shorter than 200bp
	pr1 = processReads(nucleosome_htseq, fragmentLen=200)
	
	#Now process them, but picking only the 40 bases surrounding the dyad
	pr2 = processReads(nucleosome_htseq, fragmentLen=200, trim=40)

	#Compare the results:
	par(mfrow=c(2,1), mar=c(3,4,1,1))
	plot(as.vector(coverage(pr1)[["chr1"]]), type="l", ylab="coverage (original)")
	plot(as.vector(coverage(pr2)[["chr1"]]), type="l", ylab="coverage (trimmed)")


Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(nucleR)
Loading required package: ShortRead
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: BiocParallel
Loading required package: Biostrings
Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: XVector
Loading required package: Rsamtools
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: GenomicAlignments
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/nucleR/processReads.Rd_%03d_medium.png", width=480, height=480)
> ### Name: processReads
> ### Title: Process reads from High-Troughtput Sequencing experiments
> ### Aliases: processReads processReads,AlignedRead-method
> ###   processReads,RangedData-method
> ### Keywords: manip
> 
> ### ** Examples
> 
> 	
> 	#Load data	
> 	data(nucleosome_htseq)
> 	
> 	#Process nucleosome reads, select only those shorter than 200bp
> 	pr1 = processReads(nucleosome_htseq, fragmentLen=200)
> 	
> 	#Now process them, but picking only the 40 bases surrounding the dyad
> 	pr2 = processReads(nucleosome_htseq, fragmentLen=200, trim=40)
> 
> 	#Compare the results:
> 	par(mfrow=c(2,1), mar=c(3,4,1,1))
> 	plot(as.vector(coverage(pr1)[["chr1"]]), type="l", ylab="coverage (original)")
> 	plot(as.vector(coverage(pr2)[["chr1"]]), type="l", ylab="coverage (trimmed)")
> 
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>