Clip modes are used to store "views" on the sequence object. They are often used to identify adapter sequences and low-quality "ends" which will
be trimmed before further analysis. Storing clipping information instead of clipped sequences is useful for avoiding loss of data while maintaining
information about the appropriate nucleotides for down-stream analysis. Each clip mode defines a set of left and right clip
points, one set for each read. Clip points are typically included in the SFF file, are generated by the sequence provider and are
loaded into the appropriate IRanges object when the SFF file is loaded via readSff. The vendor-generated clip points are not always desireable however, so accomidations for custom clip points are also provided.
Usage
availableClipModes(object)
Arguments
object
An object of class SffReads or SffReadsQ.
Details
availableClipModes produces a list of clip modes supported by the rSFFreader package and the object being passed to it. These can include:
adapter:
defined in the SFF file, and meant to indicate positions of adapter sequence
quality:
defined in the SFF file, and meant to indicate positions of low-quality regions of the sequence
full:
uses the "interior" of quality and adapter and is the most conservative, equivalant to Roche clip points
raw:
no clipping is applied and full length reads are returned
custom:
clip points set by the user as an IRanges object. (see examples below)
Functions are provided for setting clip mode as well as extracting and setting clip points of each type from a
SffReads or SffReadsQ object. The functions for getting/setting clip points
all work in the same way and an example is provided in the examples section below.
The functions include:
clipMode
gets/sets the adapter clip mode
adapterClip
get/set the adapter clip points as an IRanges object
customClip
get/set the custom clip points as an IRanges object
fullClip
get/set the full clip points as an IRanges object
qualityClip
get/set the quality clip points as an IRanges object
rawClip
get/set the raw clip points as an IRanges object
Currently available clipModes returned by availableClipModes is dependant on the which clipping slots
(qualityIR,adapterIR, and customIR) are set (length != 0).
Author(s)
Matt Settles <msettles@uidaho.edu>
Examples
## Load in an example dataset:
sff <- loadIonSampleData()
## Get a list of available clip modes:
availableClipModes(sff)
## Check the current clipMode. It should default to "full":
clipMode(sff)
## full clipping is the most conservative, resulting in shorter reads
hist(width(sff))
summary(width(sff))
## These reads should also be free of adapters although the first base looks suspect:
alphabetByCycle(DNAStringSet(substr(sread(sff), 1,15)), alphabet=c("A","C","T","G"))
cols <- c("green","blue","black","red","darkgrey")
leg <- c("A","C","T","G","N")
matplot(t(alphabetByCycle(DNAStringSet(substr(sread(sff), 1,15)),
alphabet=c("A","C","T","G"))), type="l", lty=1, col=cols)
legend("topright", col=cols, legend=leg, pch=18, cex=.8)
## Compare this to unclipped reads using "raw" mode:
clipMode(sff) <- "raw"
hist(width(sff),breaks=500,col="grey",xlab="Read Length",main="Raw Read Length")
alphabetByCycle(DNAStringSet(substr(sread(sff), 1,15)), alphabet=c("A","C","T","G"))
cols <- c("green","blue","black","red","darkgrey")
leg <- c("A","C","T","G","N")
matplot(t(alphabetByCycle(DNAStringSet(substr(sread(sff), 1,15)),
alphabet=c("A","C","T","G"))), type="l", lty=1, col=cols)
legend("topright", col=cols, legend=leg, pch=18, cex=.8)
## Extract clip points for further analysis:
full.clippoints <- fullClip(sff)
raw.clippoints <- rawClip(sff)
table(start(full.clippoints))
table(start(raw.clippoints))
par(mfrow=c(1,2))
hist(end(full.clippoints))
hist(end(raw.clippoints))
par(mfrow=c(1,1))
## determine how much was trimmed from each read by clipping
barplot(table(end(raw.clippoints) - end(full.clippoints)))
## Custom clip points can also be set using an IRanges object:
customClip(sff) <- IRanges(start = 1, end = 4)
clipMode(sff) <- "custom"
table(counts=as.character(sread(sff)))
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(rSFFreader)
Loading required package: ShortRead
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: BiocParallel
Loading required package: Biostrings
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: XVector
Loading required package: Rsamtools
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: GenomicAlignments
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/rSFFreader/availableClipModes.Rd_%03d_medium.png", width=480, height=480)
> ### Name: availableClipModes
> ### Title: availableClipModes
> ### Aliases: availableClipModes
>
> ### ** Examples
>
> ## Load in an example dataset:
> sff <- loadIonSampleData()
Total number of reads to be read: 1000
reading header for sff file:/home/ddbj/local/lib64/R/library/rSFFreader/extdata/SmallTorrentTest.sff
reading file:/home/ddbj/local/lib64/R/library/rSFFreader/extdata/SmallTorrentTest.sff
>
> ## Get a list of available clip modes:
> availableClipModes(sff)
[1] "full" "quality" "adapter" "raw"
>
> ## Check the current clipMode. It should default to "full":
> clipMode(sff)
[1] "full"
>
> ## full clipping is the most conservative, resulting in shorter reads
> hist(width(sff))
> summary(width(sff))
Min. 1st Qu. Median Mean 3rd Qu. Max.
5.0 140.0 266.0 215.3 297.0 353.0
>
> ## These reads should also be free of adapters although the first base looks suspect:
> alphabetByCycle(DNAStringSet(substr(sread(sff), 1,15)), alphabet=c("A","C","T","G"))
cycle
alphabet [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
A 269 287 256 217 250 261 257 253 223 248 220 222 238
C 232 214 260 268 240 250 230 255 246 241 259 249 250
T 197 225 271 240 261 231 225 238 260 244 245 251 254
G 302 274 213 275 249 250 274 237 251 245 250 248 224
cycle
alphabet [,14] [,15]
A 240 254
C 222 234
T 255 240
G 245 230
>
> cols <- c("green","blue","black","red","darkgrey")
> leg <- c("A","C","T","G","N")
> matplot(t(alphabetByCycle(DNAStringSet(substr(sread(sff), 1,15)),
+ alphabet=c("A","C","T","G"))), type="l", lty=1, col=cols)
> legend("topright", col=cols, legend=leg, pch=18, cex=.8)
>
> ## Compare this to unclipped reads using "raw" mode:
> clipMode(sff) <- "raw"
> hist(width(sff),breaks=500,col="grey",xlab="Read Length",main="Raw Read Length")
>
> alphabetByCycle(DNAStringSet(substr(sread(sff), 1,15)), alphabet=c("A","C","T","G"))
cycle
alphabet [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13]
A 0 0 1000 0 269 287 256 217 250 262 258 260 229
C 0 1000 0 0 232 214 260 268 240 251 235 260 250
T 1000 0 0 0 197 225 271 240 261 235 229 241 265
G 0 0 0 1000 302 274 213 275 249 252 278 239 256
cycle
alphabet [,14] [,15]
A 252 231
C 250 266
T 249 247
G 249 256
>
> cols <- c("green","blue","black","red","darkgrey")
> leg <- c("A","C","T","G","N")
> matplot(t(alphabetByCycle(DNAStringSet(substr(sread(sff), 1,15)),
+ alphabet=c("A","C","T","G"))), type="l", lty=1, col=cols)
> legend("topright", col=cols, legend=leg, pch=18, cex=.8)
>
> ## Extract clip points for further analysis:
> full.clippoints <- fullClip(sff)
>
> raw.clippoints <- rawClip(sff)
>
> table(start(full.clippoints))
5
1000
> table(start(raw.clippoints))
1
1000
>
> par(mfrow=c(1,2))
> hist(end(full.clippoints))
> hist(end(raw.clippoints))
>
> par(mfrow=c(1,1))
> ## determine how much was trimmed from each read by clipping
> barplot(table(end(raw.clippoints) - end(full.clippoints)))
>
> ## Custom clip points can also be set using an IRanges object:
> customClip(sff) <- IRanges(start = 1, end = 4)
> clipMode(sff) <- "custom"
> table(counts=as.character(sread(sff)))
counts
TCAG
1000
>
>
>
>
>
>
>
> dev.off()
null device
1
>