Last data update: 2014.03.03

R: Create a virtual fragment library from a provided genome and...
createVirtualFragmentLibraryR Documentation

Create a virtual fragment library from a provided genome and two restriction enzymes

Description

Basic4Cseq can create virtual fragment libraries from any BSgenome package or DNAString object. Two restriction enzymes have to be specified to cut the DNA, the read length is needed to check the fragment ends of corresponding length for uniqueness. Filter options (minimum and maximum size) are provided on fragment level and on fragment end level.

Usage

createVirtualFragmentLibrary(chosenGenome, firstCutter, secondCutter, readLength, onlyNonBlind = TRUE, useOnlyIndex = FALSE, minSize = 0, maxSize = -1, minFragEndSize = 0, maxFragEndSize = 10000000, useAllData = TRUE, chromosomeName = "chr1", libraryName = "default")

Arguments

chosenGenome

The genome that is to be digested in silico with the provided enzymes; can be an instance of BSgenome or DNAString

firstCutter

First of two restriction enzymes

secondCutter

Second of two restriction enzymes

readLength

Read length for the experiment

onlyNonBlind

Variable that is TRUE (default) if only non-blind fragments are considered (i.e. all blind fragments are removed)

useOnlyIndex

Convenience function to adapt the annotation style of the chromosomes ("chr1", ... "chrY" or "1", ..., "Y"); parameter has to be set to match the BAM file in question

minSize

Filter option that allows to delete fragments below a certain size (in bp)

maxSize

Filter option that allows to delete fragments above a certain size (in bp)

minFragEndSize

Filter option that allows to delete fragment ends below a certain size (in bp)

maxFragEndSize

Filter option that allows to delete fragment ends above a certain size (in bp)

useAllData

Variable that indicates if all data of a BSgenome package is to be used. If FALSE, chromosome names including a "_" are removed, reducing the set of chromosomes to (1 ... 19, X, Y, MT) for the mouse genome or (1 ... 22, X, Y, MT) for the human genome

chromosomeName

Chromosome name for the virtual fragment library if a DNAString object is used instead of a BSgenome object.

libraryName

Name of the file the created virtual fragment library is written to. Per default the file is called "fragments_firstCutter_secondCutter.csv". The fragment data is returned as a data frame if and only if an empty character string is chosen as libraryName.

Details

  • readLength is relevant for the creation of the virtual fragment library to differenciate between unique and non-unique fragment ends. While two fragments can be unique, their respective ends may be repetitive if only the first few bases are considered. For 4C-seq data, reads can only map to the start (or end, respectively) of a 4C-seq fragment, the remaining fragment part is not covered. The length of a fragment end that has to be checked for uniqueness therefore depends on the read length of the experiment.

  • useAllData uses the lengths of the chromosomes to identify relevant ones, based on the current BSgenome packages for mm10 or hg19, and may therefore provide undesirable results for smaller genomes with different lengths (i.e. discard all chromosomes).

  • The length of a fragment influences the expected read count of a 4C-seq fragment. Per default, Basic4Cseq uses the experiment's read length as minimum fragment end size and places virtually no limit on the maximum fragment end size.

Value

A tab-separated file with the specified virtual fragment library (containing fragment position, length, presence of second restriction enzyme and uniqueness of the fragment ends)

Note

  • It is strongly recommended to preprocess and store the virtual fragment library if a number of experiments with the same restriction enzyme combination, read length and underlying genome have to be analyzed.

  • Processing one of the larger BSgenome packages takes some time and computer data storage.

  • If no library name for the virtual fragment library is specified, the fragment data is returned as a data frame. If the library name "default" is chosen, the tab-separated file is named "fragments_firstCutter_secondCutter" (with variable cutter sequences).

Author(s)

Carolin Walter

Examples

  if(interactive()) {
    library(BSgenome.Ecoli.NCBI.20080805)
    fragmentData = createVirtualFragmentLibrary(chosenGenome = Ecoli$NC_002655, firstCutter = "catg", secondCutter = "gtac", readLength = 30,  onlyNonBlind = TRUE, chromosomeName = "NC_002655", libraryName = "fragments_Ecoli.csv")
  }

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(Basic4Cseq)
Loading required package: Biostrings
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: XVector
Loading required package: GenomicAlignments
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: Rsamtools
Loading required package: caTools

Attaching package: 'caTools'

The following object is masked from 'package:IRanges':

    runmean

The following object is masked from 'package:S4Vectors':

    runmean

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/Basic4Cseq/createVirtualFragmentLibrary.Rd_%03d_medium.png", width=480, height=480)
> ### Name: createVirtualFragmentLibrary
> ### Title: Create a virtual fragment library from a provided genome and two
> ###   restriction enzymes
> ### Aliases: createVirtualFragmentLibrary
> ###   createVirtualFragmentLibrary,BSgenome,character,character,numeric-method
> ###   createVirtualFragmentLibrary,DNAString,character,character,numeric-method
> ### Keywords: createVirtualFragmentLibrary
> 
> ### ** Examples
> 
> #  if(interactive()) {
>     library(BSgenome.Ecoli.NCBI.20080805)
Loading required package: BSgenome
Loading required package: rtracklayer
>     fragmentData = createVirtualFragmentLibrary(chosenGenome = Ecoli$NC_002655, firstCutter = "catg", secondCutter = "gtac", readLength = 30,  onlyNonBlind = TRUE, chromosomeName = "NC_002655", libraryName = "fragments_Ecoli.csv")
> #  }
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>