Last data update: 2014.03.03

R: Function to select fragments according to their size.
size.selectR Documentation

Function to select fragments according to their size.

Description

Given a vector of sequences representing DNA fragments digested by one or several restriction enzymes, the function will return fragments within a specified size range which will simulate the size selection step typical of ddRAD, RESTseq and ezRAD methods.

Usage

size.select(sequences, min.size, max.size, graph = TRUE, verbose = TRUE)

Arguments

sequences

a vector of DNA sequences representing DNA fragments after digestion, typically the output of the insilico.digest or adapt.select functions.

min.size

minimum fragment size.

max.size

maximum fragment size.

graph

if TRUE (the default) the function returns a histogram of distribution of fragment size (in grey) and selected fragments within the specified size range (in red). This may be useful to further adjust the selected size windows to increase or decrease the targeted number of loci. If FALSE, the histogram is not plotted.

verbose

if TRUE (the default) the function returns the number of loci selected. If FALSE, the function is silent.

Details

Size selection is usually performed after adaptator ligation in real life, but as adaptators are not simulated here (because they are specific to the sequencing platform and the protocol used) the user should remember to account for the adaptator length when comparing size selection in the lab and in silico. For instance, size selection of 210-260 in silico correspond to size selection of 300-350 in the lab for adaptators total length of 90bp.

Value

A vector of DNA fragment sequences.

Author(s)

Olivier Lepais

References

Lepais O & Weir JT. 2014. SimRAD: an R package for simulation-based prediction of the number of loci expected in RADseq and similar genotyping by sequencing approaches. Molecular Ecology Resources, 14, 1314-1321. DOI: 10.1111/1755-0998.12273.

Peterson et al. 2012. Double Digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non-model species. PLoS ONE 7: e37135. doi:10.1371/journal.pone.0037135

Stolle & Moritz 2013. RESTseq - Efficient benchtop population genomics with RESTriction fragment SEQuencing. PLoS ONE 8: e63960. doi:10.1371/journal.pone.0063960

Toonen et al. 2013. ezRAD: a simplified method for genomic genotyping in non-model organisms. PeerJ 1:e203 http://dx.doi.org/10.7717/peerj.203

See Also

adapt.select, exclude.seqsite.

Examples

### Example: a double digestion (ddRAD)
# simulating some sequence:
simseq <-  sim.DNAseq(size=1000000, GCfreq=0.433)

#Restriction Enzyme 1
#TaqI
cs_5p1 <- "T"
cs_3p1 <- "CGA"
#Restriction Enzyme 2
#MseI #
cs_5p2 <- "T"
cs_3p2 <- "TAA"

simseq.dig <- insilico.digest(simseq, cs_5p1, cs_3p1, cs_5p2, cs_3p2, verbose=TRUE)
simseq.sel <- adapt.select(simseq.dig, type="AB+BA", cs_5p1, cs_3p1, cs_5p2, cs_3p2)

# wide size selection (200-270):
wid.simseq <- size.select(simseq.sel,  min.size = 200, max.size = 270, graph=TRUE, verbose=TRUE)

# narrow size selection (210-260):
nar.simseq <- size.select(simseq.sel,  min.size = 210, max.size = 260, graph=TRUE, verbose=TRUE)

#the resulting fragment characteristics can be further examined:
boxplot(list(width(simseq.sel), width(wid.simseq), width(nar.simseq)), names=c("All fragments",
        "Wide size selection", "Narrow size selection"), ylab="Locus size (bp)")



Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(SimRAD)
Loading required package: Biostrings
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: XVector
Loading required package: ShortRead
Loading required package: BiocParallel
Loading required package: Rsamtools
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: GenomicAlignments
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: zlibbioc
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/SimRAD/size.select.Rd_%03d_medium.png", width=480, height=480)
> ### Name: size.select
> ### Title: Function to select fragments according to their size.
> ### Aliases: size.select
> ### Keywords: double digestion restriction enzyme library construction
> ###   adaptator ligation fragment selection restriction exclusion RESTseq
> ###   RAD GBS ddRAD ezRAD
> 
> ### ** Examples
> 
> ### Example: a double digestion (ddRAD)
> # simulating some sequence:
> simseq <-  sim.DNAseq(size=1000000, GCfreq=0.433)
> 
> #Restriction Enzyme 1
> #TaqI
> cs_5p1 <- "T"
> cs_3p1 <- "CGA"
> #Restriction Enzyme 2
> #MseI #
> cs_5p2 <- "T"
> cs_3p2 <- "TAA"
> 
> simseq.dig <- insilico.digest(simseq, cs_5p1, cs_3p1, cs_5p2, cs_3p2, verbose=TRUE)
Number of restriction sites for the first enzyme: 3716
Number of restriction sites for the second enzyme: 6373
Number of type AB and BA fragments:10088
Number of type AA fragments:3716
Number of type BB fragments:6372
Warning messages:
1: In NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) :
  subscript is an array, passing it thru as.vector() first
2: In NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) :
  subscript is an array, passing it thru as.vector() first
3: In NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) :
  subscript is an array, passing it thru as.vector() first
4: In NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) :
  subscript is an array, passing it thru as.vector() first
> simseq.sel <- adapt.select(simseq.dig, type="AB+BA", cs_5p1, cs_3p1, cs_5p2, cs_3p2)
Warning messages:
1: In NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) :
  subscript is an array, passing it thru as.vector() first
2: In NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) :
  subscript is an array, passing it thru as.vector() first
> 
> # wide size selection (200-270):
> wid.simseq <- size.select(simseq.sel,  min.size = 200, max.size = 270, graph=TRUE, verbose=TRUE)
644 fragments between 200 and 270 bp 
> 
> # narrow size selection (210-260):
> nar.simseq <- size.select(simseq.sel,  min.size = 210, max.size = 260, graph=TRUE, verbose=TRUE)
451 fragments between 210 and 260 bp 
> 
> #the resulting fragment characteristics can be further examined:
> boxplot(list(width(simseq.sel), width(wid.simseq), width(nar.simseq)), names=c("All fragments",
+         "Wide size selection", "Narrow size selection"), ylab="Locus size (bp)")
> 
> 
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>