Use BamFile() to create a reference to a BAM file (and
optionally its index). The reference remains open across calls to
methods, avoiding costly index re-loading.
BamFileList() provides a convenient way of managing a list of
BamFile instances.
Usage
## Constructors
BamFile(file, index=file, ..., yieldSize=NA_integer_, obeyQname=FALSE,
asMates=FALSE, qnamePrefixEnd=NA, qnameSuffixStart=NA)
BamFileList(..., yieldSize=NA_integer_, obeyQname=FALSE, asMates=FALSE,
qnamePrefixEnd=NA, qnameSuffixStart=NA)
## Opening / closing
## S3 method for class 'BamFile'
open(con, ...)
## S3 method for class 'BamFile'
close(con, ...)
## accessors; also path(), index(), yieldSize()
## S4 method for signature 'BamFile'
isOpen(con, rw="")
## S4 method for signature 'BamFile'
isIncomplete(con)
## S4 method for signature 'BamFile'
obeyQname(object, ...)
obeyQname(object, ...) <- value
## S4 method for signature 'BamFile'
asMates(object, ...)
asMates(object, ...) <- value
## S4 method for signature 'BamFile'
qnamePrefixEnd(object, ...)
qnamePrefixEnd(object, ...) <- value
## S4 method for signature 'BamFile'
qnameSuffixStart(object, ...)
qnameSuffixStart(object, ...) <- value
## actions
## S4 method for signature 'BamFile'
scanBamHeader(files, ..., what=c("targets", "text"))
## S4 method for signature 'BamFile'
seqinfo(x)
## S4 method for signature 'BamFileList'
seqinfo(x)
## S4 method for signature 'BamFile'
filterBam(file, destination, index=file, ...,
filter=FilterRules(), indexDestination=TRUE,
param=ScanBamParam(what=scanBamWhat()))
## S4 method for signature 'BamFile'
indexBam(files, ...)
## S4 method for signature 'BamFile'
sortBam(file, destination, ..., byQname=FALSE, maxMemory=512)
## S4 method for signature 'BamFileList'
mergeBam(files, destination, ...)
## reading
## S4 method for signature 'BamFile'
scanBam(file, index=file, ..., param=ScanBamParam(what=scanBamWhat()))
## counting
## S4 method for signature 'BamFile'
countBam(file, index=file, ..., param=ScanBamParam())
## S4 method for signature 'BamFileList'
countBam(file, index=file, ..., param=ScanBamParam())
## S4 method for signature 'BamFile'
quickBamFlagSummary(file, ..., param=ScanBamParam(), main.groups.only=FALSE)
Arguments
...
Additional arguments.
For BamFileList, this can either be a single character vector
of paths to BAM files, or several instances of BamFile
objects. When a character vector of paths, a second named argument
‘index’ can be a character() vector of length equal to
the first argument specifying the paths to the index files, or
character() to indicate that no index file is available. See
BamFile.
con
An instance of BamFile.
x, object, file, files
A character vector of BAM file paths
(for BamFile) or a BamFile instance (for other
methods).
index
character(1); the BAM index file path (for
BamFile); ignored for all other methods on this page.
yieldSize
Number of records to yield each time the file
is read from with scanBam. See ‘Fields’
section for details.
asMates
Logical indicating if records should be paired
as mates. See ‘Fields’ section for details.
qnamePrefixEnd
Single character (or NA) marking the
end of the qname prefix. When specified, all characters prior to
and including the qnamePrefixEnd are removed from the qname.
If the prefix is not found in the qname the qname is not trimmed.
Currently only implemented for mate-pairing (i.e., when
asMates=TRUE in a BamFile.
qnameSuffixStart
Single character (or NA) marking the
start of the qname suffix. When specified, all characters following
and including the qnameSuffixStart are removed from the qname.
If the suffix is not found in the qname the qname is not trimmmed.
Currently only implemented for mate-pairing (i.e., when
asMates=TRUE in a BamFile.
obeyQname
Logical indicating if the BAM file is sorted
by qname. In Bioconductor > 2.12 paired-end files do
not need to be sorted by qname. Instead use
asMates=TRUE for reading paired-end data. See
‘Fields’ section for details.
value
Logical value for setting asMates and
obeyQname in a BamFile instance.
what
For scanBamHeader, a character vector specifying
that either or both of c("targets", "text") are to be
extracted from the header; see scanBam for
additional detail.
filter
A FilterRules instance. Functions in the
FilterRules instance should expect a single DataFrame
argument representing all information specified by
param. Each function must return a logical vector,
usually of length equal to the number of rows of the
DataFrame. Return values are used to include (when
TRUE) corresponding records in the filtered BAM file.
destination
character(1) file path to write filtered reads to.
indexDestination
logical(1) indicating whether the destination
file should also be indexed.
byQname, maxMemory
See sortBam.
param
An optional ScanBamParam instance to
further influence scanning, counting, or filtering.
rw
Mode of file; ignored.
main.groups.only
See quickBamFlagSummary.
Objects from the Class
Objects are created by calls of the form BamFile().
Fields
The BamFile class inherits fields from the
RsamtoolsFile class and has fields:
yieldSize:
Number of records to yield each time the file is
read from using scanBam or, when length(bamWhich())
!= 0, a threshold which yields records in complete ranges whose
sum first exceeds yieldSize. Setting yieldSize on a
BamFileList does not alter existing yield sizes set on the
individual BamFile instances.
asMates:
A logical indicating if the records should be
returned as mated pairs. When TRUEscanBam attempts
to mate (pair) the records and returns two additional fields
groupid and mate_status. groupid is an integer
vector of unique group ids; mate_status is a factor with level
mated for records successfully paired by the algorithm,
ambiguous for records that are possibly mates but cannot be
assigned unambiguously, or unmated for reads that did not
have valid mates.
Mate criteria:
Bit 0x40 and 0x80: Segments are a pair of first/last OR
neither segment is marked first/last
Bit 0x100: Both segments are secondary OR both not secondary
Bit 0x10 and 0x20: Segments are on opposite strands
Flags, tags and ranges may be specified in the ScanBamParam
for fine tuning of results.
obeyQname:
A logical(0) indicating if the file was sorted by
qname. In Bioconductor > 2.12 paired-end files do not need to be
sorted by qname. Instead set asMates=TRUE in the
BamFile when using the readGAlignmentsList
function from the GenomicAlignments package.
Functions and methods
BamFileList inherits additional methods from
RsamtoolsFileList and SimpleList.
Opening / closing:
open.BamFile
Opens the (local or remote) path and
index (if bamIndex is not character(0)),
files. Returns a BamFile instance.
close.BamFile
Closes the BamFilecon; returning
(invisibly) the updated BamFile. The instance may be
re-opened with open.BamFile.
isOpen
Tests whether the BamFilecon has been
opened for reading.
isIncomplete
Tests whether the BamFilecon is
niether closed nor at the end of the file.
Accessors:
path
Returns a character(1) vector of BAM path names.
index
Returns a character(0) or character(1) vector of BAM
index path names.
yieldSize, yieldSize<-
Return or set an integer(1) vector
indicating yield size.
obeyQname, obeyQname<-
Return or set a logical(0)
indicating if the file was sorted by qname.
asMates, asMates<-
Return or set a logical(0)
indicating if the records should be returned as mated pairs.
Methods:
scanBamHeader
Visit the path in path(file), returning
the information contained in the file header; see
scanBamHeader.
seqinfo, seqnames, seqlength
Visit the path in
path(file), returning a Seqinfo,
character, or named integer vector containing information on the
anmes and / or lengths of each sequence. Seqnames are ordered
as they appear in the file.
scanBam
Visit the path in path(file), returning the
result of scanBam applied to the specified path.
countBam
Visit the path(s) in path(file), returning
the result of countBam applied to the specified
path.
filterBam
Visit the path in path(file), returning the
result of filterBam applied to the specified
path. A single file can be filtered to one or several
destinations, as described in filterBam.
indexBam
Visit the path in path(file), returning
the result of indexBam applied to the specified
path.
sortBam
Visit the path in path(file), returning the
result of sortBam applied to the specified path.
mergeBam
Merge several BAM files into a single BAM file. See
mergeBam for details; additional arguments supported
by mergeBam,character-method are also available for
BamFileList.
show
Compactly display the object.
Author(s)
Martin Morgan and Marc Carlson
See Also
The readGAlignments,
readGAlignmentPairs,
and readGAlignmentsList
functions defined in the GenomicAlignments package.
summarizeOverlaps and
findSpliceOverlaps-methods in the
GenomicAlignments package for methods that work on a
BamFile and BamFileList objects.
Examples
##
## BamFile options.
##
fl <- system.file("extdata", "ex1.bam", package="Rsamtools")
bf <- BamFile(fl)
bf
## When 'asMates=TRUE' scanBam() reads the data in as
## pairs. See 'asMates' above for details of the pairing
## algorithm.
asMates(bf) <- TRUE
## When 'yieldSize' is set, scanBam() will iterate
## through the file in chunks.
yieldSize(bf) <- 500
## Some applications append a filename (e.g., NCBI Sequence Read
## Archive (SRA) toolkit) or allele identifier to the sequence qname.
## This may result in a unique qname for each record which presents a
## problem when mating paired-end reads (identical qnames is one
## criteria for paired-end mating). 'qnamePrefixEnd' and
## 'qnameSuffixStart' can be used to trim an unwanted prefix or suffix.
qnamePrefixEnd(bf) <- "/"
qnameSuffixStart(bf) <- "."
##
## Reading Bam files.
##
fl <- system.file("extdata", "ex1.bam", package="Rsamtools",
mustWork=TRUE)
(bf <- BamFile(fl))
head(seqlengths(bf)) # sequences and lengths in BAM file
if (require(RNAseqData.HNRNPC.bam.chr14)) {
bfl <- BamFileList(RNAseqData.HNRNPC.bam.chr14_BAMFILES)
bfl
bfl[1:2] # subset
bfl[[1]] # select first element -- BamFile
## merged across BAM files
seqinfo(bfl)
head(seqlengths(bfl))
}
length(scanBam(fl)[[1]][[1]]) # all records
bf <- open(BamFile(fl)) # implicit index
bf
identical(scanBam(bf), scanBam(fl))
close(bf)
## Use 'yieldSize' to iterate through a file in chunks.
bf <- open(BamFile(fl, yieldSize=1000))
while (nrec <- length(scanBam(bf)[[1]][[1]]))
cat("records:", nrec, "\n")
close(bf)
## Repeatedly visit multiple ranges in the BamFile.
rng <- GRanges(c("seq1", "seq2"), IRanges(1, c(1575, 1584)))
bf <- open(BamFile(fl))
sapply(seq_len(length(rng)), function(i, bamFile, rng) {
param <- ScanBamParam(which=rng[i], what="seq")
bam <- scanBam(bamFile, param=param)[[1]]
alphabetFrequency(bam[["seq"]], baseOnly=TRUE, collapse=TRUE)
}, bf, rng)
close(bf)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(Rsamtools)
Loading required package: GenomeInfoDb
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: XVector
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/Rsamtools/BamFile-class.Rd_%03d_medium.png", width=480, height=480)
> ### Name: BamFile
> ### Title: Maintain and use BAM files
> ### Aliases: BamFile-class BamFileList-class BamFile BamFileList
> ### open.BamFile close.BamFile isOpen,BamFile-method
> ### isIncomplete,BamFile-method scanBamHeader,BamFile-method
> ### seqinfo,BamFile-method seqinfo,BamFileList-method obeyQname
> ### obeyQname<- obeyQname,BamFile-method obeyQname<-,BamFile-method
> ### obeyQname,BamFileList-method obeyQname<-,BamFileList-method asMates
> ### asMates<- asMates,BamFile-method asMates<-,BamFile-method
> ### asMates,BamFileList-method asMates<-,BamFileList-method
> ### qnamePrefixEnd qnamePrefixEnd<- qnamePrefixEnd,BamFile-method
> ### qnamePrefixEnd<-,BamFile-method qnamePrefixEnd,BamFileList-method
> ### qnamePrefixEnd<-,BamFileList-method qnameSuffixStart
> ### qnameSuffixStart<- qnameSuffixStart,BamFile-method
> ### qnameSuffixStart<-,BamFile-method qnameSuffixStart,BamFileList-method
> ### qnameSuffixStart<-,BamFileList-method scanBam,BamFile-method
> ### countBam,BamFile-method countBam,BamFileList-method
> ### filterBam,BamFile-method indexBam,BamFile-method
> ### sortBam,BamFile-method mergeBam,BamFileList-method
> ### quickBamFlagSummary,BamFile-method show,BamFile-method
> ### show,BamFileList-method
> ### Keywords: classes
>
> ### ** Examples
>
>
> ##
> ## BamFile options.
> ##
>
> fl <- system.file("extdata", "ex1.bam", package="Rsamtools")
> bf <- BamFile(fl)
> bf
class: BamFile
path: /home/ddbj/local/lib64/R/library/Rsamtools/extdata/ex1.bam
index: /home/ddbj/local/lib64/R/library/Rsamtools/extdata/ex1.bam.bai
isOpen: FALSE
yieldSize: NA
obeyQname: FALSE
asMates: FALSE
qnamePrefixEnd: NA
qnameSuffixStart: NA
>
> ## When 'asMates=TRUE' scanBam() reads the data in as
> ## pairs. See 'asMates' above for details of the pairing
> ## algorithm.
> asMates(bf) <- TRUE
>
> ## When 'yieldSize' is set, scanBam() will iterate
> ## through the file in chunks.
> yieldSize(bf) <- 500
>
> ## Some applications append a filename (e.g., NCBI Sequence Read
> ## Archive (SRA) toolkit) or allele identifier to the sequence qname.
> ## This may result in a unique qname for each record which presents a
> ## problem when mating paired-end reads (identical qnames is one
> ## criteria for paired-end mating). 'qnamePrefixEnd' and
> ## 'qnameSuffixStart' can be used to trim an unwanted prefix or suffix.
> qnamePrefixEnd(bf) <- "/"
> qnameSuffixStart(bf) <- "."
>
> ##
> ## Reading Bam files.
> ##
>
> fl <- system.file("extdata", "ex1.bam", package="Rsamtools",
+ mustWork=TRUE)
> (bf <- BamFile(fl))
class: BamFile
path: /home/ddbj/local/lib64/R/library/Rsamtools/extdata/ex1.bam
index: /home/ddbj/local/lib64/R/library/Rsamtools/extdata/ex1.bam.bai
isOpen: FALSE
yieldSize: NA
obeyQname: FALSE
asMates: FALSE
qnamePrefixEnd: NA
qnameSuffixStart: NA
> head(seqlengths(bf)) # sequences and lengths in BAM file
seq1 seq2
1575 1584
>
> if (require(RNAseqData.HNRNPC.bam.chr14)) {
+ bfl <- BamFileList(RNAseqData.HNRNPC.bam.chr14_BAMFILES)
+ bfl
+ bfl[1:2] # subset
+ bfl[[1]] # select first element -- BamFile
+ ## merged across BAM files
+ seqinfo(bfl)
+ head(seqlengths(bfl))
+ }
Loading required package: RNAseqData.HNRNPC.bam.chr14
chr1 chr10 chr11
249250621 135534747 135006516
chr11_gl000202_random chr12 chr13
40103 133851895 115169878
>
>
> length(scanBam(fl)[[1]][[1]]) # all records
[1] 3307
>
> bf <- open(BamFile(fl)) # implicit index
> bf
class: BamFile
path: /home/ddbj/local/lib64/R/library/Rsamtools/extdata/ex1.bam
index: /home/ddbj/local/lib64/R/library/Rsamtools/extdata/ex1.bam.bai
isOpen: TRUE
yieldSize: NA
obeyQname: FALSE
asMates: FALSE
qnamePrefixEnd: NA
qnameSuffixStart: NA
> identical(scanBam(bf), scanBam(fl))
[1] TRUE
> close(bf)
>
> ## Use 'yieldSize' to iterate through a file in chunks.
> bf <- open(BamFile(fl, yieldSize=1000))
> while (nrec <- length(scanBam(bf)[[1]][[1]]))
+ cat("records:", nrec, "\n")
records: 1000
records: 1000
records: 1000
records: 307
> close(bf)
>
> ## Repeatedly visit multiple ranges in the BamFile.
> rng <- GRanges(c("seq1", "seq2"), IRanges(1, c(1575, 1584)))
> bf <- open(BamFile(fl))
> sapply(seq_len(length(rng)), function(i, bamFile, rng) {
+ param <- ScanBamParam(which=rng[i], what="seq")
+ bam <- scanBam(bamFile, param=param)[[1]]
+ alphabetFrequency(bam[["seq"]], baseOnly=TRUE, collapse=TRUE)
+ }, bf, rng)
[,1] [,2]
A 12843 0
C 13142 0
G 10833 0
T 16021 0
other 12 0
> close(bf)
>
>
>
>
>
>
> dev.off()
null device
1
>