Last data update: 2014.03.03

R: Remove invalid 4C-seq reads from a SAM file
checkRestrictionEnzymeSequenceR Documentation

Remove invalid 4C-seq reads from a SAM file

Description

Basic4Cseq offers filter functions for invalid 4C-seq reads. This function removes 4C-seq reads from a provided Sequence Alignment/Map (SAM) file that show mismatches in the restriction enzyme sequence.

Usage

checkRestrictionEnzymeSequence(firstCutter, inputFileName, outputFileName = "output.sam", keepOnlyUniqueReads = TRUE, writeStatistics = TRUE)

Arguments

firstCutter

First restriction enzyme sequence of the 4C-seq experiment

inputFileName

Name of the input SAM file that contains aligned reads for the 4C-seq experiment

outputFileName

Name of the output SAM file that is created to store the filtered 4C-seq reads

keepOnlyUniqueReads

If TRUE, delete non-unique reads. Information in the SAM flag field is used to determine whether a read is unique or not.

writeStatistics

If TRUE, write statistics (e.g. the number of unique reads) to a text file

Details

Valid 4C-seq reads start at a primary restriction site and continue with its downstream sequence, so any mismatch in the restriction enzyme sequence of a read is an indicator for a mismatch. The mapping information of the restriction enzyme sequence bases of a read (if present) can be used for filtering purposes. checkRestrictionEnzymeSequence tests the first bases of a read (depending on the length of the first restriction enzyme either 4 or 6 bp long) for mismatches. Reads with mismatches in the restriction enzyme sequence are deleted, the filtered data is then written to a new SAM file. The function does not yet differentiate between blind and nonblind fragments, but removes potential misalignments that may overlap with valid fragment ends and distort the true 4C-seq signal.

Value

A SAM file containing the filtered valid 4C-seq reads

Note

The use of the function is only possible if the restriction enzyme sequence is not trimmed or otherwise absent.

Author(s)

Carolin Walter

Examples

  if(interactive()) {
    file <- system.file("extdata", "fetalLiverCutter.sam", package="Basic4Cseq")
    checkRestrictionEnzymeSequence("aagctt", file)
  }

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(Basic4Cseq)
Loading required package: Biostrings
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: XVector
Loading required package: GenomicAlignments
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: Rsamtools
Loading required package: caTools

Attaching package: 'caTools'

The following object is masked from 'package:IRanges':

    runmean

The following object is masked from 'package:S4Vectors':

    runmean

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/Basic4Cseq/checkRestrictionEnzymeSequence.Rd_%03d_medium.png", width=480, height=480)
> ### Name: checkRestrictionEnzymeSequence
> ### Title: Remove invalid 4C-seq reads from a SAM file
> ### Aliases: checkRestrictionEnzymeSequence
> ###   checkRestrictionEnzymeSequence,character,character-method
> ### Keywords: checkRestrictionEnzymeSequence
> 
> ### ** Examples
> 
> #  if(interactive()) {
>     file <- system.file("extdata", "fetalLiverCutter.sam", package="Basic4Cseq")
>     checkRestrictionEnzymeSequence("aagctt", file)
> #  }
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>