Last data update: 2014.03.03

R: Function to digest DNA sequence using one or two restriction...
insilico.digestR Documentation

Function to digest DNA sequence using one or two restriction enzyme(s).

Description

Perform an in silico digestion of a DNA sequence using one or two restriction enzyme(s).

Usage

insilico.digest(DNAseq, cut_site_5prime1, cut_site_3prime1, 
                        cut_site_5prime2="NULL", cut_site_3prime2="NULL",  
                        cut_site_5prime3="NULL", cut_site_3prime3="NULL", 
                        cut_site_5prime4="NULL", cut_site_3prime4="NULL", 
                        verbose = TRUE)

Arguments

DNAseq

A DNA sequence (character string) as output for instance, but not exclusively, by sim.DNAseq or ref.DNAseq functions.

cut_site_5prime1

5 prime side (left side) of the recognized restriction site of the restriction enzyme 1.

cut_site_3prime1

3 prime side (right side) of the recognized restriction site of the restriction enzyme 1.

cut_site_5prime2

5 prime side (left side) of the recognized restriction site of the restriction enzyme 2 (optional).

cut_site_3prime2

3 prime side (right side) of the recognized restriction site of the restriction enzyme 2 (optional).

cut_site_5prime3

5 prime side (left side) of the recognized restriction site of the restriction enzyme 3 (optional).

cut_site_3prime3

3 prime side (right side) of the recognized restriction site of the restriction enzyme 3 (optional).

cut_site_5prime4

5 prime side (left side) of the recognized restriction site of the restriction enzyme 4 (optional).

cut_site_3prime4

3 prime side (right side) of the recognized restriction site of the restriction enzyme 4 (optional).

verbose

If TRUE (the default), returns the number of restriction sites (if only one restriction enzyme is used) or the number of restriction sites for the two enzymes and the number of fragments flanked by either restriction site of the enzyme 1 or 2 and flanked by restriction fragments of two different enzymes. FALSE makes the function silent to be used in a loop.

Details

Restriction site with alternative bases such as ApeKI used for GBS by Elshire et al. (2011) can be simulated by specifying the two alternative recognition motifs of the enzyme as parameter for enzyme 1 and enzyme 2 (see example below).

Value

A vector of DNA fragments resulting from the digestion.

Author(s)

Olivier Lepais and Jason Weir

References

Lepais O & Weir JT. 2014. SimRAD: an R package for simulation-based prediction of the number of loci expected in RADseq and similar genotyping by sequencing approaches. Molecular Ecology Resources, 14, 1314-1321. DOI: 10.1111/1755-0998.12273.

Baird et al. 2008. Rapid SNP discovery and genetic mapping using sequenced RAD markers. PLoS ONE 3: e3376. doi:10.1371/journal.pone.0003376

Elshire et al. 2011. A robust, simple Genotyping-By-Sequencing (GBS) approach for high diversity species. PLoS ONE 6: e19379. doi:10.1371/journal.pone.0019379

Peterson et al. 2012. Double Digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non-model species. PLoS ONE 7: e37135. doi:10.1371/journal.pone.0037135

Poland et al. 2012. Development of high-density genetic maps for barley and wheat using a novel two-enzyme Genotyping-By-Sequencing approach. PLoS ONE 7: e32253. doi:10.1371/journal.pone.0032253

Stolle & Moritz 2013. RESTseq - Efficient benchtop population genomics with RESTriction fragment SEQuencing. PLoS ONE 8: e63960. doi:10.1371/journal.pone.0063960

Toonen et al. 2013. ezRAD: a simplified method for genomic genotyping in non-model organisms. PeerJ 1:e203 http://dx.doi.org/10.7717/peerj.203

See Also

sim.DNAseq, ref.DNAseq.

Examples


### Example 1: a single digestion (RAD)
simseq <-  sim.DNAseq(size=1000000, GCfreq=0.51)
#Restriction Enzyme 1
#SbfI
cs_5p1 <- "CCTGCA"
cs_3p1 <- "GG"
simseq.dig <- insilico.digest(simseq, cs_5p1, cs_3p1, verbose=TRUE)


### Example 2: a single digestion (GBS)
simseq <-  sim.DNAseq(size=1000000, GCfreq=0.51)
#Restriction Enzyme 1
# ApeKI : G|CWGC  which is equivalent of either G|CAGC or  G|CTGC
cs_5p1 <- "G"
cs_3p1 <- "CAGC"
cs_5p2 <- "G"
cs_3p2 <- "CTGC"
simseq.dig <- insilico.digest(simseq, cs_5p1, cs_3p1, cs_5p1, cs_3p1, verbose=TRUE)


### Example 3: a double digestion (ddRAD)
# simulating some sequence:
simseq <-  sim.DNAseq(size=1000000, GCfreq=0.433)
#Restriction Enzyme 1
#PstI
cs_5p1 <- "CTGCA"
cs_3p1 <- "G"
#Restriction Enzyme 2
#MseI #
cs_5p2 <- "T"
cs_3p2 <- "TAA"
simseq.dig <- insilico.digest(simseq, cs_5p1, cs_3p1, cs_5p2, cs_3p2, verbose=TRUE)




Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(SimRAD)
Loading required package: Biostrings
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: XVector
Loading required package: ShortRead
Loading required package: BiocParallel
Loading required package: Rsamtools
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: GenomicAlignments
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: zlibbioc
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/SimRAD/insilico.digest.Rd_%03d_medium.png", width=480, height=480)
> ### Name: insilico.digest
> ### Title: Function to digest DNA sequence using one or two restriction
> ###   enzyme(s).
> ### Aliases: insilico.digest
> ### Keywords: double digestion restriction enzyme library construction
> ###   adaptator ligation fragment selection restriction exclusion RESTseq
> ###   RAD GBS ddRAD ezRAD
> 
> ### ** Examples
> 
> 
> ### Example 1: a single digestion (RAD)
> simseq <-  sim.DNAseq(size=1000000, GCfreq=0.51)
> #Restriction Enzyme 1
> #SbfI
> cs_5p1 <- "CCTGCA"
> cs_3p1 <- "GG"
> simseq.dig <- insilico.digest(simseq, cs_5p1, cs_3p1, verbose=TRUE)
Number of restriction sites: 15
> 
> 
> ### Example 2: a single digestion (GBS)
> simseq <-  sim.DNAseq(size=1000000, GCfreq=0.51)
> #Restriction Enzyme 1
> # ApeKI : G|CWGC  which is equivalent of either G|CAGC or  G|CTGC
> cs_5p1 <- "G"
> cs_3p1 <- "CAGC"
> cs_5p2 <- "G"
> cs_3p2 <- "CTGC"
> simseq.dig <- insilico.digest(simseq, cs_5p1, cs_3p1, cs_5p1, cs_3p1, verbose=TRUE)
Number of restriction sites for the first enzyme: 1004
Number of restriction sites for the second enzyme: 1004
Number of type AB and BA fragments:2006
Number of type AA fragments:1003
Number of type BB fragments:1003
Warning messages:
1: In NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) :
  subscript is an array, passing it thru as.vector() first
2: In NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) :
  subscript is an array, passing it thru as.vector() first
3: In NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) :
  subscript is an array, passing it thru as.vector() first
4: In NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) :
  subscript is an array, passing it thru as.vector() first
> 
> 
> ### Example 3: a double digestion (ddRAD)
> # simulating some sequence:
> simseq <-  sim.DNAseq(size=1000000, GCfreq=0.433)
> #Restriction Enzyme 1
> #PstI
> cs_5p1 <- "CTGCA"
> cs_3p1 <- "G"
> #Restriction Enzyme 2
> #MseI #
> cs_5p2 <- "T"
> cs_3p2 <- "TAA"
> simseq.dig <- insilico.digest(simseq, cs_5p1, cs_3p1, cs_5p2, cs_3p2, verbose=TRUE)
Number of restriction sites for the first enzyme: 180
Number of restriction sites for the second enzyme: 6376
Number of type AB and BA fragments:350
Number of type AA fragments:5
Number of type BB fragments:6200
Warning messages:
1: In NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) :
  subscript is an array, passing it thru as.vector() first
2: In NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) :
  subscript is an array, passing it thru as.vector() first
3: In NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) :
  subscript is an array, passing it thru as.vector() first
4: In NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) :
  subscript is an array, passing it thru as.vector() first
> 
> 
> 
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>