Last data update: 2014.03.03

R: Function to select fragments according to the flanking...
adapt.selectR Documentation

Function to select fragments according to the flanking restriction sites.

Description

Given a vector of sequences representing DNA fragments digested by two restriction enzymes (RE1 and RE2), the function will return fragments flanked by either the same restriction size (E1-E1 fragments typically for RESTseq method) or different restriction sites (RE1 and RE2, for ezRAD and ddRAD methods).

Usage

adapt.select(sequences, type = "AB+BA", cut_site_5prime1, cut_site_3prime1, 
             cut_site_5prime2, cut_site_3prime2)

Arguments

sequences

a vector of DNA sequences representing DNA fragments after digestion by two restriction enzymes, typically the output of the insilico.digest function.

type

type of fragments to return. Options are "AA": fragments flanked by two restriction enzymes 1 sites, "AB+BA": the default, fragments flanked by two different restriction enzyme sites, "BB": fragments flanked by two restriction enzymes 2 sites. For flexibility, "AB" and "BA" options are also available and will return directional digested fragments but such fragments are of no use in current GBS methods as yet.

cut_site_5prime1

5 prime side (left side) of the recognized restriction site of the restriction enzyme 1.

cut_site_3prime1

3 prime side (right side) of the recognized restriction site of the restriction enzyme 1.

cut_site_5prime2

5 prime side (left side) of the recognized restriction site of the restriction enzyme 2.

cut_site_3prime2

3 prime side (right side) of the recognized restriction site of the restriction enzyme 2.

Details

This function will be generally needed when simulating double digest ddRAD methods where fragments flanked by two different restriction site are selected during library construction (type="AB+BA").

In addition, when simulating RESTseq method, type = "AA" can be used to exclude fragments that contained the restriction site of enzyme 2, as an alternative or in complement to subsequent DNA fragment exclusion by additional restriction site (see function exclude.seqsite).

Value

A vector of DNA fragment sequences.

Author(s)

Olivier Lepais

References

Lepais O & Weir JT. 2014. SimRAD: an R package for simulation-based prediction of the number of loci expected in RADseq and similar genotyping by sequencing approaches. Molecular Ecology Resources, 14, 1314-1321. DOI: 10.1111/1755-0998.12273.

Peterson et al. 2012. Double Digest RADseq: an inexpensive method for de novo SNP discovery and genotyping in model and non-model species. PLoS ONE 7: e37135. doi:10.1371/journal.pone.0037135

Stolle & Moritz 2013. RESTseq - Efficient benchtop population genomics with RESTriction fragment SEQuencing. PLoS ONE 8: e63960. doi:10.1371/journal.pone.0063960

See Also

insilico.digest, exclude.seqsite, size.select.

Examples

# simulating some sequence:
simseq <-  sim.DNAseq(size=10000, GCfreq=0.433)

#Restriction Enzyme 1
#PstI
cs_5p1 <- "CTGCA"
cs_3p1 <- "G"

#Restriction Enzyme 2
#MseI
cs_5p2 <- "T"
cs_3p2 <- "TAA"

# double digestion:
simseq.dig <- insilico.digest(simseq, cs_5p1, cs_3p1, cs_5p2, cs_3p2, verbose=TRUE)
#selection of AB type fragments
simseq.selected <- adapt.select(simseq.dig, type="AB+BA", cs_5p1, cs_3p1, cs_5p2, cs_3p2)
# number of selected fragments:
length(simseq.selected)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(SimRAD)
Loading required package: Biostrings
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: XVector
Loading required package: ShortRead
Loading required package: BiocParallel
Loading required package: Rsamtools
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: GenomicAlignments
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: zlibbioc
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/SimRAD/adapt.select.Rd_%03d_medium.png", width=480, height=480)
> ### Name: adapt.select
> ### Title: Function to select fragments according to the flanking
> ###   restriction sites.
> ### Aliases: adapt.select
> ### Keywords: double digestion restriction enzyme library construction
> ###   adaptator ligation fragment selection ddRAD RESTseq
> 
> ### ** Examples
> 
> # simulating some sequence:
> simseq <-  sim.DNAseq(size=10000, GCfreq=0.433)
> 
> #Restriction Enzyme 1
> #PstI
> cs_5p1 <- "CTGCA"
> cs_3p1 <- "G"
> 
> #Restriction Enzyme 2
> #MseI
> cs_5p2 <- "T"
> cs_3p2 <- "TAA"
> 
> # double digestion:
> simseq.dig <- insilico.digest(simseq, cs_5p1, cs_3p1, cs_5p2, cs_3p2, verbose=TRUE)
Number of restriction sites for the first enzyme: 2
Number of restriction sites for the second enzyme: 72
Number of type AB and BA fragments:5
Number of type AA fragments:0
Number of type BB fragments:69
Warning messages:
1: In NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) :
  subscript is an array, passing it thru as.vector() first
2: In NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) :
  subscript is an array, passing it thru as.vector() first
3: In NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) :
  subscript is an array, passing it thru as.vector() first
4: In NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) :
  subscript is an array, passing it thru as.vector() first
> #selection of AB type fragments
> simseq.selected <- adapt.select(simseq.dig, type="AB+BA", cs_5p1, cs_3p1, cs_5p2, cs_3p2)
Warning messages:
1: In NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) :
  subscript is an array, passing it thru as.vector() first
2: In NSBS(i, x, exact = exact, upperBoundIsStrict = !allow.append) :
  subscript is an array, passing it thru as.vector() first
> # number of selected fragments:
> length(simseq.selected)
[1] 5
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>