Contains quality related summarizing data on FASTQ files.
Objects from the Class
Objects can be created by calls of the form fastqq("test.fq").
Slots
filenames:
"character": Vector of Fastqq file names.
probeLabel:
"character": Vector of probe labels.
nFiles:
"integer": Length of fileNamess.
k:
"integer": Length of counted DNA k-mers.
maxSeqLen:
"integer" Maximum sequence length found in
FASTQ files. Determines row-number in 'seqLenCount' matrix and
column-number in 'nac' and 'phred' slot.
kmer:
"matrix" Matrix containing DNA k-mers counts.
firstKmer:
"matrix" Matrix containing count of
incipient DNA k-mers.
nReads:
"integer" Vector containing number of reads
per file.
seqLenCount:
"matrix" Matrix containing Counts of
read lengths.
gcContent:
"matrix" Matrix containing GC content
(in percent).
nN:
"integer" Vector containing Number of N
nucleotide entries per file.
nac:
"list" Contains counted per position alphabet
frequencies.
phred:
"list" Contains per position phred count
tables (one per Fastqq file).
seqLen:
"matrix" Contains minimal and maximal
sequence length (one column per file).
collectTime:
"list" Contains start and end time of
FASTQ reading as 'POSIXct'.
Methods
The following methods are defined for class Fastqq:
Basic accessors:
getK
signature(object="Fastqq"): Returns k-value
(length of DNA k-mers) as integer.
kmerCount
signature(object="Fastqq"): Returns
matrix with 4^k rows anc nFiles columns. For each
k-mer and FASTQ-file, the absolute count value of the k-mer in the
FASTQ file is given.
nFiles
signature(object="Fastqq"): Returns number of
Files from which data has been collected as integer.
nNnucs
signature(object="Fastqq"): Returns integer
vector of length nFiles. For each FASTQ file, the absolute
number of containes 'N' nucleotide entries is given.
nReads
signature(object="Fastqq"): Returns number of
reads in each FASTQ file as integer.
fileNames
signature(object="Fastqq"): Returns number
names of FASTQ files from which data has been collected as
character.
maxSeqLen
signature(object="Fastqq"): Returns maximum
sequence length which has been found in all FASTQ files as
integer.
seqLenCount
signature(object="Fastqq"): Returns matrix
which tables counted read length in all FASTQ files.
gcContent
signature(object="Fastqq",i="numeric"):
Returns integer vector of length 100 which countains
absolute read count numbers for each percentage of GC-content.
i is the index of the FASTQ file for wich the values
are returned. The GC content values for all files together can
be obtained using gcContentMatrix.
nucFreq
signature(object="Fastqq",i="integer"):
Returns matrix which contains the absolute nucleotide
count values for each nucleotide and read position. i is
the index of the FASTQ file for wich the values are returned.
seqLen
signature(object="Fastqq"): Returns matrix
with two rows and nFiles columns. For each file the minimum
and maximum read length is given.
kmerCount
signature(object="Fastqq"): Returns a
matrix with 4^k rows and nFiles columns. Each entry
gives the absolute count of the k-mer (given as row name) in each
file (given as column name).
phred
signature(object="Fastqq",i="integer"): Returns
a matrix with 93 rows and maxSeqLen columns. The
matrix gived the absolute counts of each phred value for each
sequence position. i is the index of the FASTQ file for
wich the values are returned.
phredQuantiles
signature(object="Fastqq",
quantiles="numeric", i="integer"): Returns a data.frame.
The data.frame has one row for each given quantile and
maxSeqLen columns. Each value gives the quantile (given by
row name) of the phred values at the sequence position (given by
column name). For the quantiles argument, a numeric vector
with values in [0,1] must be given. For the i argument, a
single integer value must be given which denotes the index of the
FASTQ file from which values are returned (value must be in
{1,...,nFiles}).
probeLabel
signature(object="Fastqq"): Returns
character vector which contains the probeLabel
entries for given Fastqq object.
Author(s)
Wolfgang Kaisers
References
Cock PJA, Fields CJ, Goto N, Heuer ML, Rice PM
The sanger FASTQ file format for sequences with quality scores and
the Solexa/Illumina FASTQ variants.
Nucleic Acids Research 2010 Vol.38 No.6 1767-1771