R Graphical Manual

Browse All

Last data update: 2014.03.03

R: availableClipModes

availableClipModes

R Documentation

availableClipModes

Description

Clip modes are used to store "views" on the sequence object. They are often used to identify adapter sequences and low-quality "ends" which will be trimmed before further analysis. Storing clipping information instead of clipped sequences is useful for avoiding loss of data while maintaining information about the appropriate nucleotides for down-stream analysis. Each clip mode defines a set of left and right clip points, one set for each read. Clip points are typically included in the SFF file, are generated by the sequence provider and are loaded into the appropriate IRanges object when the SFF file is loaded via readSff. The vendor-generated clip points are not always desireable however, so accomidations for custom clip points are also provided.

Usage

  availableClipModes(object)

Arguments

object

An object of class SffReads or SffReadsQ.

Details

availableClipModes produces a list of clip modes supported by the rSFFreader package and the object being passed to it. These can include:

adapter:: defined in the SFF file, and meant to indicate positions of adapter sequence
quality:: defined in the SFF file, and meant to indicate positions of low-quality regions of the sequence
full:: uses the "interior" of quality and adapter and is the most conservative, equivalant to Roche clip points
raw:: no clipping is applied and full length reads are returned
custom:: clip points set by the user as an IRanges object. (see examples below)

Functions are provided for setting clip mode as well as extracting and setting clip points of each type from a SffReads or SffReadsQ object. The functions for getting/setting clip points all work in the same way and an example is provided in the examples section below. The functions include:

clipMode: gets/sets the adapter clip mode
adapterClip: get/set the adapter clip points as an IRanges object
customClip: get/set the custom clip points as an IRanges object
fullClip: get/set the full clip points as an IRanges object
qualityClip: get/set the quality clip points as an IRanges object
rawClip: get/set the raw clip points as an IRanges object

Currently available clipModes returned by availableClipModes is dependant on the which clipping slots (qualityIR,adapterIR, and customIR) are set (length != 0).

Author(s)

Matt Settles <msettles@uidaho.edu>

Examples

## Load in an example dataset:
sff <- loadIonSampleData()

## Get a list of available clip modes:
availableClipModes(sff)

## Check the current clipMode. It should default to "full":
clipMode(sff)

## full clipping is the most conservative, resulting in shorter reads
hist(width(sff))
summary(width(sff))

## These reads should also be free of adapters although the first base looks suspect:
alphabetByCycle(DNAStringSet(substr(sread(sff), 1,15)), alphabet=c("A","C","T","G"))

cols <- c("green","blue","black","red","darkgrey")
leg <-  c("A","C","T","G","N")
matplot(t(alphabetByCycle(DNAStringSet(substr(sread(sff), 1,15)),
    alphabet=c("A","C","T","G"))), type="l", lty=1, col=cols)
legend("topright", col=cols, legend=leg, pch=18, cex=.8)

## Compare this to unclipped reads using "raw" mode:
clipMode(sff) <- "raw"
hist(width(sff),breaks=500,col="grey",xlab="Read Length",main="Raw Read Length")

alphabetByCycle(DNAStringSet(substr(sread(sff), 1,15)), alphabet=c("A","C","T","G"))

cols <- c("green","blue","black","red","darkgrey")
leg <-  c("A","C","T","G","N")
matplot(t(alphabetByCycle(DNAStringSet(substr(sread(sff), 1,15)),
    alphabet=c("A","C","T","G"))), type="l", lty=1, col=cols)
legend("topright", col=cols, legend=leg, pch=18, cex=.8)

## Extract clip points for further analysis:
full.clippoints <- fullClip(sff)

raw.clippoints <- rawClip(sff)

table(start(full.clippoints))
table(start(raw.clippoints))

par(mfrow=c(1,2))
hist(end(full.clippoints))
hist(end(raw.clippoints))

par(mfrow=c(1,1))
## determine how much was trimmed from each read by clipping
barplot(table(end(raw.clippoints) - end(full.clippoints)))

## Custom clip points can also be set using an IRanges object:
customClip(sff) <- IRanges(start = 1, end = 4)
clipMode(sff) <- "custom"
table(counts=as.character(sread(sff)))

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(rSFFreader)
Loading required package: ShortRead
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: BiocParallel
Loading required package: Biostrings
Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: XVector
Loading required package: Rsamtools
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: GenomicAlignments
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/rSFFreader/availableClipModes.Rd_%03d_medium.png", width=480, height=480)
> ### Name: availableClipModes
> ### Title: availableClipModes
> ### Aliases: availableClipModes
> 
> ### ** Examples
> 
> ## Load in an example dataset:
> sff <- loadIonSampleData()
Total number of reads to be read: 1000
reading header for sff file:/home/ddbj/local/lib64/R/library/rSFFreader/extdata/SmallTorrentTest.sff
reading file:/home/ddbj/local/lib64/R/library/rSFFreader/extdata/SmallTorrentTest.sff
> 
> ## Get a list of available clip modes:
> availableClipModes(sff)
[1] "full"    "quality" "adapter" "raw"    
> 
> ## Check the current clipMode. It should default to "full":
> clipMode(sff)
[1] "full"
> 
> ## full clipping is the most conservative, resulting in shorter reads
> hist(width(sff))
> summary(width(sff))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
    5.0   140.0   266.0   215.3   297.0   353.0 
> 
> ## These reads should also be free of adapters although the first base looks suspect:
> alphabetByCycle(DNAStringSet(substr(sread(sff), 1,15)), alphabet=c("A","C","T","G"))
        cycle
alphabet [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
       A  269  287  256  217  250  261  257  253  223   248   220   222   238
       C  232  214  260  268  240  250  230  255  246   241   259   249   250
       T  197  225  271  240  261  231  225  238  260   244   245   251   254
       G  302  274  213  275  249  250  274  237  251   245   250   248   224
        cycle
alphabet [,14] [,15]
       A   240   254
       C   222   234
       T   255   240
       G   245   230
> 
> cols <- c("green","blue","black","red","darkgrey")
> leg <-  c("A","C","T","G","N")
> matplot(t(alphabetByCycle(DNAStringSet(substr(sread(sff), 1,15)),
+     alphabet=c("A","C","T","G"))), type="l", lty=1, col=cols)
> legend("topright", col=cols, legend=leg, pch=18, cex=.8)
> 
> ## Compare this to unclipped reads using "raw" mode:
> clipMode(sff) <- "raw"
> hist(width(sff),breaks=500,col="grey",xlab="Read Length",main="Raw Read Length")
> 
> alphabetByCycle(DNAStringSet(substr(sread(sff), 1,15)), alphabet=c("A","C","T","G"))
        cycle
alphabet [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
       A    0    0 1000    0  269  287  256  217  250   262   258   260   229
       C    0 1000    0    0  232  214  260  268  240   251   235   260   250
       T 1000    0    0    0  197  225  271  240  261   235   229   241   265
       G    0    0    0 1000  302  274  213  275  249   252   278   239   256
        cycle
alphabet [,14] [,15]
       A   252   231
       C   250   266
       T   249   247
       G   249   256
> 
> cols <- c("green","blue","black","red","darkgrey")
> leg <-  c("A","C","T","G","N")
> matplot(t(alphabetByCycle(DNAStringSet(substr(sread(sff), 1,15)),
+     alphabet=c("A","C","T","G"))), type="l", lty=1, col=cols)
> legend("topright", col=cols, legend=leg, pch=18, cex=.8)
> 
> ## Extract clip points for further analysis:
> full.clippoints <- fullClip(sff)
> 
> raw.clippoints <- rawClip(sff)
> 
> table(start(full.clippoints))

   5 
1000 
> table(start(raw.clippoints))

   1 
1000 
> 
> par(mfrow=c(1,2))
> hist(end(full.clippoints))
> hist(end(raw.clippoints))
> 
> par(mfrow=c(1,1))
> ## determine how much was trimmed from each read by clipping
> barplot(table(end(raw.clippoints) - end(full.clippoints)))
> 
> ## Custom clip points can also be set using an IRanges object:
> customClip(sff) <- IRanges(start = 1, end = 4)
> clipMode(sff) <- "custom"
> table(counts=as.character(sread(sff)))
counts
TCAG 
1000 
> 
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>