cbDistMatrix function: Calculates pairwise distance matrix from DNA k-mer counts
based on a modified Canberra distance.
Description
Calculates pairwise distance matrix from DNA k-mer counts based on a modified
Canberra distance. Before calculating canberra distances, read counts are
normalized (in order to correct systematic effects on the distance) by
scaling up read counts in each DNA k-mer count vector so that normalized
read counts in each sample are nearly equal.
Fastqq: Object from which DNA k-mer counts are used.
nReadNorm
numeric:
Number of reads per file to wich all contained DNA k-mer counts are
normalized.
Because the normalization is intended to increase counts the value
must be greater than all FASTQ file read counts (as reported by nReads).
Therefore the standard value is chosen to the maximal number of reads
recorded in this object.
This normalization is necessary to compensate for systematic effects
in the canberra distance.
Details
The distance between two DNA k-mer normalized count vectors
is calculated by
df (X,Y) = ∑ cbc(x_i, y_i) / 4^k
where cb is given by
cbd(x,y)=|x-y|/(x+y).
Value
Square matrix. The number of rows equals the number of files
(=nFiles(object)).
Note
The static size of the retured k-mer array is 4^k.
Author(s)
Wolfgang Kaisers
References
Cock PJA, Fields CJ, Goto N, Heuer ML, Rice PM
The sanger FASTQ file format for sequences with quality scores and the
Solexa/Illumina FASTQ variants.
Nucleic Acids Research 2010 Vol.38 No.6 1767-1771
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(seqTools)
Loading required package: zlibbioc
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/seqTools/cbDistMatrix.Rd_%03d_medium.png", width=480, height=480)
> ### Name: cbDistMatrix
> ### Title: cbDistMatrix function: Calculates pairwise distance matrix from
> ### DNA k-mer counts based on a modified Canberra distance.
> ### Aliases: cbDistMatrix cbDistMatrix-methods cbDistMatrix,Fastqq-method
> ### Keywords: cbDistMatrix kmer
>
> ### ** Examples
>
> basedir<-system.file("extdata",package="seqTools")
> basenames<-c("g4_l101_n100.fq.gz","g5_l101_n100.fq.gz")
> filenames<-file.path(basedir,basenames)
> fq<-fastqq(filenames,6,c("g4","g5"))
[fastqq] File ( 1/2) '/home/ddbj/local/lib64/R/library/seqTools/extdata/g4_l101_n100.fq.gz' done.
[fastqq] File ( 2/2) '/home/ddbj/local/lib64/R/library/seqTools/extdata/g5_l101_n100.fq.gz' done.
> dm<-cbDistMatrix(fq)
>
>
>
>
>
> dev.off()
null device
1
>