Last data update: 2014.03.03

R: Extract integer Phred score values from FastQ data
intPhredR Documentation

Extract integer Phred score values from FastQ data

Description

Function to extract integer Phred score values from FastQ data.

Usage

intPhred(x, method="Sanger", returnType="list")

Arguments

x

object of class ShortReadQ; which contains read sequences and quality scores; usually read in from a Fastq files.

method

string; one of 'Sanger', 'Solexa' or 'previousSolexa'. See details below.

returnType

string; in which format should the result be returned, either as a 'list' or as a 'matrix'.

Details

There are different standards for encoding read qualities in Fastq files. The 'Sanger' format encodes a Phred quality score from 0 to 93 using ASCII 33 to 126. The current 'Solexa'/llumina format (1.3 and higher) encodes a Phred quality score from 0 to 40 using ASCII 64 to 104. The 'previous Solexa'/Illumina format (1.0) encodes a custom Solexa/Illumina quality score from -5 to 40 using ASCII 59 to 104. This custom Solexa quality score is approximately equal to the Phred scores for high qualities, but differs in the low quality range.

Value

If returnType is equal to ‘list’: A list of integer Phred quality values of the same length as the number of reads in the object x.

If returnType is equal to ‘matrix’: A matrix of integer Phred quality values. The number of rows is the number of reads in the object x. The number of columns is the maximum length (width) over all reads in object x. The last entries for reads that are shorter than this maximum width are 'NA'.

Author(s)

Joern Toedling

References

http://maq.sourceforge.net/fastq.shtml

See Also

ShortReadQ-class, readFastq

Examples

  exDir <- system.file("extdata", package="girafe")
  ra  <- readFastq(dirPath=exDir, pattern=
            "aravinSRNA_23_plus_adapter_excerpt.fastq")
  ra.quals <- intPhred(ra, method="Sanger",
                       returnType="matrix")
  ra.qmed  <- apply(ra.quals, 2, median)
  if (interactive())
     plot(ra.qmed, type="h", ylim=c(0,42), xlab="Base postion",
          ylab="Median Phred Quality Score", lwd=2, col="steelblue")

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(girafe)
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: Rsamtools
Loading required package: GenomeInfoDb
Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: XVector
Loading required package: intervals

Attaching package: 'intervals'

The following object is masked from 'package:Biostrings':

    type

The following object is masked from 'package:GenomicRanges':

    reduce

The following object is masked from 'package:IRanges':

    reduce

The following object is masked from 'package:S4Vectors':

    expand

Loading required package: ShortRead
Loading required package: BiocParallel
Loading required package: GenomicAlignments
Loading required package: SummarizedExperiment
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: genomeIntervals
Loading required package: grid
No methods found in "IRanges" for requests: sort
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/girafe/intPhred.Rd_%03d_medium.png", width=480, height=480)
> ### Name: intPhred
> ### Title: Extract integer Phred score values from FastQ data
> ### Aliases: intPhred
> ### Keywords: manip
> 
> ### ** Examples
> 
>   exDir <- system.file("extdata", package="girafe")
>   ra  <- readFastq(dirPath=exDir, pattern=
+             "aravinSRNA_23_plus_adapter_excerpt.fastq")
>   ra.quals <- intPhred(ra, method="Sanger",
+                        returnType="matrix")
>   ra.qmed  <- apply(ra.quals, 2, median)
> #  if (interactive())
>      plot(ra.qmed, type="h", ylim=c(0,42), xlab="Base postion",
+           ylab="Median Phred Quality Score", lwd=2, col="steelblue")
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>