Last data update: 2014.03.03

R: Export a BSgenome object as a FASTA or twoBit file
export-methodsR Documentation

Export a BSgenome object as a FASTA or twoBit file

Description

export methods for BSgenome objects.

NOTE: The export generic function and most of its methods are defined and documented in the rtracklayer package. This man page only documents the 2 export methods define in the BSgenome package.

Usage

## S4 method for signature 'BSgenome,FastaFile,ANY'
export(object, con, format, compress=FALSE, compression_level=NA, verbose=TRUE)
## S4 method for signature 'BSgenome,TwoBitFile,ANY'
export(object, con, format, ...)

Arguments

object

The BSgenome object to export.

con

A FastaFile or TwoBitFile object.

Alternatively con can be a single string containing the path to a FASTA or twoBit file, in which case either the file extension or the format argument needs to be "fasta", "twoBit", or "2bit". Also note that in this case, the export method that is called is either the method with signature c("ANY", "character", "missing") or the method with signature c("ANY", "character", "character"), both defined in the rtracklayer package. If object is a BSgenome object and the file extension or the format argument is "fasta", "twoBit", or "2bit", then the flow eventually reaches one of 2 methods documented here.

format

If not missing, should be "fasta", "twoBit", or "2bit" (case insensitive for "twoBit" and "2bit").

compress, compression_level

Forwarded to writeXStringSet. See ?writeXStringSet for the details.

verbose

Whether or not the function should display progress. TRUE by default.

...

Extra arguments. The method for TwoBitFile objects forwards them to bsapply.

Author(s)

Michael Lawrence

See Also

  • BSgenome objects.

  • The export generic, and FastaFile and TwoBitFile objects in the rtracklayer package.

Examples

library(BSgenome.Celegans.UCSC.ce2)
genome <- BSgenome.Celegans.UCSC.ce2

## Export as FASTA file.
out1_file <- file.path(tempdir(), "Celegans.fasta")
export(genome, out1_file)

## Export as twoBit file.
out2_file <- file.path(tempdir(), "Celegans.2bit")
export(genome, out2_file)

## Sanity checks:
dna0 <- DNAStringSet(as.list(genome))

system.time(dna1 <- import(out1_file))
stopifnot(identical(names(dna0), names(dna1)) && all(dna0 == dna1))

system.time(dna2 <- import(out2_file))  # importing twoBit is 10-20x
                                        # faster than importing non
                                        # compressed FASTA 
stopifnot(identical(names(dna0), names(dna2)) && all(dna0 == dna2))

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(BSgenome)
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: XVector
Loading required package: rtracklayer
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/BSgenome/export-methods.Rd_%03d_medium.png", width=480, height=480)
> ### Name: export-methods
> ### Title: Export a BSgenome object as a FASTA or twoBit file
> ### Aliases: export-methods writeBSgenomeToFasta
> ###   export,BSgenome,FastaFile,ANY-method writeBSgenomeToTwobit
> ###   export,BSgenome,TwoBitFile,ANY-method
> ### Keywords: methods utilities
> 
> ### ** Examples
> 
> library(BSgenome.Celegans.UCSC.ce2)
> genome <- BSgenome.Celegans.UCSC.ce2
> 
> ## Export as FASTA file.
> out1_file <- file.path(tempdir(), "Celegans.fasta")
> export(genome, out1_file)
writing chrI sequence to file ... OK
writing chrII sequence to file ... OK
writing chrIII sequence to file ... OK
writing chrIV sequence to file ... OK
writing chrV sequence to file ... OK
writing chrX sequence to file ... OK
writing chrM sequence to file ... OK
> 
> ## Export as twoBit file.
> out2_file <- file.path(tempdir(), "Celegans.2bit")
> export(genome, out2_file)
> 
> ## Sanity checks:
> dna0 <- DNAStringSet(as.list(genome))
> 
> system.time(dna1 <- import(out1_file))
   user  system elapsed 
  0.532   0.120   0.651 
> stopifnot(identical(names(dna0), names(dna1)) && all(dna0 == dna1))
> 
> system.time(dna2 <- import(out2_file))  # importing twoBit is 10-20x
   user  system elapsed 
  0.412   0.088   0.499 
>                                         # faster than importing non
>                                         # compressed FASTA 
> stopifnot(identical(names(dna0), names(dna2)) && all(dna0 == dna2))
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>