Last data update: 2014.03.03

R: Manipulate indexed fasta files.
FaFileR Documentation

Manipulate indexed fasta files.

Description

Use FaFile() to create a reference to an indexed fasta file. The reference remains open across calls to methods, avoiding costly index re-loading.

FaFileList() provides a convenient way of managing a list of FaFile instances.

Usage


## Constructors

FaFile(file, index=sprintf("%s.fai", file), ...)
FaFileList(...)

## Opening / closing

## S3 method for class 'FaFile'
open(con, ...)
## S3 method for class 'FaFile'
close(con, ...)

## accessors; also path(), index()

## S4 method for signature 'FaFile'
isOpen(con, rw="")

## actions

## S4 method for signature 'FaFile'
indexFa(file, ...)

## S4 method for signature 'FaFile'
scanFaIndex(file, ...)
## S4 method for signature 'FaFileList'
scanFaIndex(file, ..., as=c("GRangesList", "GRanges"))

## S4 method for signature 'FaFile'
seqinfo(x)

## S4 method for signature 'FaFile'
countFa(file, ...)

## S4 method for signature 'FaFile,GRanges'
scanFa(file, param, ...,
    as=c("DNAStringSet", "RNAStringSet", "AAStringSet"))
## S4 method for signature 'FaFile,RangesList'
scanFa(file, param, ...,
    as=c("DNAStringSet", "RNAStringSet", "AAStringSet"))
## S4 method for signature 'FaFile,missing'
scanFa(file, param, ...,
    as=c("DNAStringSet", "RNAStringSet", "AAStringSet"))

## S4 method for signature 'FaFile'
getSeq(x, param, ...)
## S4 method for signature 'FaFileList'
getSeq(x, param, ...)

Arguments

con, x

An instance of FaFile or (for getSeq) FaFileList.

file, index

A character(1) vector of the fasta or fasta index file path (for FaFile), or an instance of class FaFile or FaFileList (for scanFaIndex, getSeq).

param

An optional GRanges or RangesList instance to select reads (and sub-sequences) for input. See Methods, below.

...

Additional arguments.

  • For FaFileList, this can either be a single character vector of paths to BAM files, or several instances of FaFile objects.

  • For scanFa,FaFile,missing-method this can include arguemnts to readDNAStringSet / readRNAStringSet / readAAStringSet when param is ‘missing’.

rw

Mode of file; ignored.

as

A character(1) vector indicating the type of object to return.

  • For scanFaIndex, default GRangesList, with index information from each file is returned as an element of the list.

  • For scanFa, default DNAStringSet.

GRangesList, index information is collapsed across files into the unique index elements.

Objects from the Class

Objects are created by calls of the form FaFile().

Fields

The FaFile class inherits fields from the RsamtoolsFile class.

Functions and methods

FaFileList inherits methods from RsamtoolsFileList and SimpleList.

Opening / closing:

open.FaFile

Opens the (local or remote) path and index files. Returns a FaFile instance.

close.FaFile

Closes the FaFile con; returning (invisibly) the updated FaFile. The instance may be re-opened with open.FaFile.

Accessors:

path

Returns a character(1) vector of the fasta path name.

index

Returns a character(1) vector of fasta index name (minus the '.fai' extension).

Methods:

indexFa

Visit the path in path(file) and create an index file (with the extension ‘.fai’).

scanFaIndex

Read the sequence names and and widths of recorded in an indexed fasta file, returning the information as a GRanges object.

seqinfo

Consult the index file for defined sequences (seqlevels()) and lengths (seqlengths()).

countFa

Return the number of records in the fasta file.

scanFa

Return the sequences indicated by param as a DNAStringSet instance. seqnames(param) selects the sequences to return; start(param) and end{param} define the (1-based) region of the sequence to return. Values of end(param) greater than the width of the sequence cause an error; use seqlengths(FaFile(file)) to discover sequence widths. When param is missing, all records are selected. When length(param)==0 no records are selected.

getSeq

Returns the sequences indicated by param from the indexed fasta file(s) of file.

For the FaFile method, the return type is a DNAStringSet. The getSeq,FaFile and scanFa,FaFile,GRanges methods differ in that getSeq will reverse complement sequences selected from the minus strand.

For the FaFileList method, the param argument must be a GRangesList of the same length as file, creating a one-to-one mapping between the ith element of file and the ith element of param; the return type is a SimpleList of DNAStringSet instances, with elements of the list in the same order as the input elements.

show

Compactly display the object.

Author(s)

Martin Morgan

Examples


fl <- system.file("extdata", "ce2dict1.fa", package="Rsamtools",
                  mustWork=TRUE)
fa <- open(FaFile(fl))                   # open
countFa(fa)
(idx <- scanFaIndex(fa))
(dna <- scanFa(fa, param=idx[1:2]))
ranges(idx) <- narrow(ranges(idx), -10)  # last 10 nucleotides
(dna <- scanFa(fa, param=idx[1:2]))

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(Rsamtools)
Loading required package: GenomeInfoDb
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: XVector
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/Rsamtools/FaFile-class.Rd_%03d_medium.png", width=480, height=480)
> ### Name: FaFile
> ### Title: Manipulate indexed fasta files.
> ### Aliases: FaFile-class FaFileList-class FaFile FaFileList open.FaFile
> ###   'close.FaFile ' isOpen,FaFile-method indexFa,FaFile-method
> ###   scanFaIndex,FaFile-method scanFaIndex,FaFileList-method
> ###   seqinfo,FaFile-method countFa,FaFile-method
> ###   scanFa,FaFile,GRanges-method scanFa,FaFile,RangesList-method
> ###   scanFa,FaFile,missing-method getSeq,FaFile-method
> ###   getSeq,FaFileList-method
> ### Keywords: classes
> 
> ### ** Examples
> 
> 
> fl <- system.file("extdata", "ce2dict1.fa", package="Rsamtools",
+                   mustWork=TRUE)
> fa <- open(FaFile(fl))                   # open
> countFa(fa)
[1] 5
> (idx <- scanFaIndex(fa))
GRanges object with 5 ranges and 0 metadata columns:
       seqnames    ranges strand
          <Rle> <IRanges>  <Rle>
  [1] pattern01   [1, 18]      *
  [2] pattern02   [1, 25]      *
  [3] pattern03   [1, 24]      *
  [4] pattern04   [1, 24]      *
  [5] pattern05   [1, 25]      *
  -------
  seqinfo: 5 sequences from an unspecified genome
> (dna <- scanFa(fa, param=idx[1:2]))
  A DNAStringSet instance of length 2
    width seq                                               names               
[1]    18 GCGAAACTAGGAGAGGCT                                pattern01
[2]    25 CTGTTAGCTAATTTTAAAAATAAAT                         pattern02
> ranges(idx) <- narrow(ranges(idx), -10)  # last 10 nucleotides
> (dna <- scanFa(fa, param=idx[1:2]))
  A DNAStringSet instance of length 2
    width seq                                               names               
[1]    10 AGGAGAGGCT                                        pattern01
[2]    10 AAAAATAAAT                                        pattern02
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>