R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Class '"Fastqq"'

Fastqq-class

R Documentation

Class `"Fastqq"`

Description

Contains quality related summarizing data on FASTQ files.

Objects from the Class

Objects can be created by calls of the form fastqq("test.fq").

Slots

filenames:: "character": Vector of Fastqq file names.
probeLabel:: "character": Vector of probe labels.
nFiles:: "integer": Length of fileNamess.
k:: "integer": Length of counted DNA k-mers.
maxSeqLen:: "integer" Maximum sequence length found in FASTQ files. Determines row-number in 'seqLenCount' matrix and column-number in 'nac' and 'phred' slot.
kmer:: "matrix" Matrix containing DNA k-mers counts.
firstKmer:: "matrix" Matrix containing count of incipient DNA k-mers.
nReads:: "integer" Vector containing number of reads per file.
seqLenCount:: "matrix" Matrix containing Counts of read lengths.
gcContent:: "matrix" Matrix containing GC content (in percent).
nN:: "integer" Vector containing Number of N nucleotide entries per file.
nac:: "list" Contains counted per position alphabet frequencies.
phred:: "list" Contains per position phred count tables (one per Fastqq file).
seqLen:: "matrix" Contains minimal and maximal sequence length (one column per file).
collectTime:: "list" Contains start and end time of FASTQ reading as 'POSIXct'.

Methods

The following methods are defined for class Fastqq:

Basic accessors:

getK: signature(object="Fastqq"): Returns k-value (length of DNA k-mers) as integer.

kmerCount: signature(object="Fastqq"): Returns matrix with 4^k rows anc nFiles columns. For each k-mer and FASTQ-file, the absolute count value of the k-mer in the FASTQ file is given.

nFiles: signature(object="Fastqq"): Returns number of Files from which data has been collected as integer.

nNnucs: signature(object="Fastqq"): Returns integer vector of length nFiles. For each FASTQ file, the absolute number of containes 'N' nucleotide entries is given.

nReads: signature(object="Fastqq"): Returns number of reads in each FASTQ file as integer.

fileNames: signature(object="Fastqq"): Returns number names of FASTQ files from which data has been collected as character.

maxSeqLen: signature(object="Fastqq"): Returns maximum sequence length which has been found in all FASTQ files as integer.

seqLenCount: signature(object="Fastqq"): Returns matrix which tables counted read length in all FASTQ files.

gcContent: signature(object="Fastqq",i="numeric"): Returns integer vector of length 100 which countains absolute read count numbers for each percentage of GC-content. i is the index of the FASTQ file for wich the values are returned. The GC content values for all files together can be obtained using gcContentMatrix.

nucFreq: signature(object="Fastqq",i="integer"): Returns matrix which contains the absolute nucleotide count values for each nucleotide and read position. i is the index of the FASTQ file for wich the values are returned.

seqLen: signature(object="Fastqq"): Returns matrix with two rows and nFiles columns. For each file the minimum and maximum read length is given.

kmerCount: signature(object="Fastqq"): Returns a matrix with 4^k rows and nFiles columns. Each entry gives the absolute count of the k-mer (given as row name) in each file (given as column name).

phred: signature(object="Fastqq",i="integer"): Returns a matrix with 93 rows and maxSeqLen columns. The matrix gived the absolute counts of each phred value for each sequence position. i is the index of the FASTQ file for wich the values are returned.

phredQuantiles: signature(object="Fastqq", quantiles="numeric", i="integer"): Returns a data.frame. The data.frame has one row for each given quantile and maxSeqLen columns. Each value gives the quantile (given by row name) of the phred values at the sequence position (given by column name). For the quantiles argument, a numeric vector with values in [0,1] must be given. For the i argument, a single integer value must be given which denotes the index of the FASTQ file from which values are returned (value must be in {1,...,nFiles}).

probeLabel: signature(object="Fastqq"): Returns character vector which contains the probeLabel entries for given Fastqq object.

Author(s)

Wolfgang Kaisers

References

Cock PJA, Fields CJ, Goto N, Heuer ML, Rice PM The sanger FASTQ file format for sequences with quality scores and the Solexa/Illumina FASTQ variants. Nucleic Acids Research 2010 Vol.38 No.6 1767-1771

Examples

basedir <- system.file("extdata", package="seqTools")
setwd(basedir)
fq <- fastqq(c("g4_l101_n100.fq.gz","g5_l101_n100.fq.gz"), 
                                k=4, probeLabel=c("g4","g5"))
#
fileNames(fq)
getK(fq)
nNnucs(fq)
nFiles(fq)
nReads(fq)
maxSeqLen(fq)
collectTime(fq)
collectDur(fq)
slc<-seqLenCount(fq)
nf<-nucFreq(fq,1)
nf[1:4,1:10]
seqLen(fq)
probeLabel(fq)
probeLabel(fq) <- 1:nFiles(fq)
#
kc<-kmerCount(fq)
kc[1:10, ]
plotKmerCount(fq)
#
ph<-phred(fq, 1)
ph[25:35,1:15]
pq <- phredQuantiles(fq,c(0.25, 0.5, 0.75), 1)
plotNucFreq(fq, 1)
# Nucleotide count
plotNucCount(fq, 2:3) 
# GC content
gcContent(fq, 1)
#
fqq<-fq[1]

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(seqTools)
Loading required package: zlibbioc
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/seqTools/Fastqq-class.Rd_%03d_medium.png", width=480, height=480)
> ### Name: Fastqq-class
> ### Title: Class '"Fastqq"'
> ### Aliases: Fastqq-class [-methods [,Fastqq-method gcContent
> ###   gcContent-methods gcContent,Fastqq-method getK getK-methods
> ###   getK,Fastqq-method fileNames fileNames-methods
> ###   fileNames,Fastqq-method nFiles nFiles-methods nFiles,Fastqq-method
> ###   nNnucs nNnucs-methods nNnucs,Fastqq-method nReads nReads-methods
> ###   nReads,Fastqq-method maxSeqLen maxSeqLen-methods
> ###   maxSeqLen,Fastqq-method phred phred-methods phred,Fastqq-method
> ###   phredQuantiles phredQuantiles-methods phredQuantiles,Fastqq-method
> ###   seqLenCount seqLenCount-methods seqLenCount,Fastqq-method nucFreq
> ###   nucFreq-methods nucFreq,Fastqq-method seqLen seqLen-methods
> ###   seqLen,Fastqq-method kmerCount kmerCount-methods
> ###   kmerCount,Fastqq-method probeLabel probeLabel-methods
> ###   probeLabel,Fastqq-method probeLabel<- probeLabel<--methods
> ###   probeLabel<-,Fastqq-method
> ### Keywords: classes fastqq kmer
> 
> ### ** Examples
> 
> basedir <- system.file("extdata", package="seqTools")
> setwd(basedir)
> fq <- fastqq(c("g4_l101_n100.fq.gz","g5_l101_n100.fq.gz"), 
+                                 k=4, probeLabel=c("g4","g5"))
[fastqq] File ( 1/2) 'g4_l101_n100.fq.gz'	done.
[fastqq] File ( 2/2) 'g5_l101_n100.fq.gz'	done.
> #
> fileNames(fq)
[1] "g4_l101_n100.fq.gz" "g5_l101_n100.fq.gz"
> getK(fq)
[1] 4
> nNnucs(fq)
[1] 0 2
> nFiles(fq)
[1] 2
> nReads(fq)
[1] 100 100
> maxSeqLen(fq)
[1] 101
> collectTime(fq)
$start
[1] "2016-07-07 06:42:20 JST"

$end
[1] "2016-07-07 06:42:20 JST"

> collectDur(fq)
[1] 0.002157688
> slc<-seqLenCount(fq)
> nf<-nucFreq(fq,1)
> nf[1:4,1:10]
   1  2  3  4  5  6  7  8  9 10
a 14 16 22 17 37 25 17 23 34 32
c 38 29 24 32 26 20 17 19 22 19
g 29 23 31 24 20 31 22 33 19 18
t 19 32 23 27 17 24 44 25 25 31
> seqLen(fq)
           g4  g5
minSeqLen 101 101
maxSeqLen 101 101
> probeLabel(fq)
[1] "g4" "g5"
> probeLabel(fq) <- 1:nFiles(fq)
> #
> kc<-kmerCount(fq)
> kc[1:10, ]
       1   2
AAAA 141 309
AAAC   9   5
AAAG  57 114
AAAT  83 124
AACA  22  17
AACC  24  19
AACG   0   0
AACT  17   9
AAGA  51  72
AAGC  45  18
> plotKmerCount(fq)
> #
> ph<-phred(fq, 1)
> ph[25:35,1:15]
    1  2  3 4 5 6 7 8 9 10 11 12 13 14 15
24  0  0  0 1 1 2 0 1 1  0  1  1  1  1  1
25  7  3  1 1 1 0 1 0 2  1  0  0  0  2  0
26  0  0  0 1 2 0 2 1 1  0  0  0  1  1  1
27  0  0  2 2 1 0 0 0 2  3  2  2  1  2  2
28  3  4  0 3 3 0 0 1 1  1  0  2  0  1  0
29  4  0  0 2 2 4 3 0 2  3  3  0  2  0  2
30 10  6  7 1 2 5 1 2 2  1  2  4  2  1  5
31 31 28 32 2 3 0 3 3 3  3  1  2  1  0  4
32  5  3  4 3 8 4 4 6 3  3  8  0  4  4  4
33 10 12  6 4 7 8 7 8 5  5  4  6  4  2  2
34 16 23 30 3 8 7 4 5 7  4  7  6  7  6  2
> pq <- phredQuantiles(fq,c(0.25, 0.5, 0.75), 1)
> plotNucFreq(fq, 1)
> # Nucleotide count
> plotNucCount(fq, 2:3) 
> # GC content
> gcContent(fq, 1)
  0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19 
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 
 20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39 
  0   0   0   0   0   0   0   0   0   0   0   2   1   2   2   2   4   4   1   7 
 40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59 
  8   8   1   1   4   1   4   1   5   3   3   7   4   2   4   5   0   1   1   1 
 60  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79 
  0   1   1   2   3   1   0   2   1   0   0   0   0   0   0   0   0   0   0   0 
 80  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99 
  0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0 
100 
  0 
> #
> fqq<-fq[1]
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>

Class "Fastqq"