Filter basecalling results to keep only high-quality bases
Usage
## S4 method for signature 'RolexaRun'
FilterResults(run=Rolexa.env,results)
FilterResults(run,...)
Arguments
run
a RolexaRun object defining the run parameters
results
a results object from SeqScore
...
additional arguments, ignored
Details
FilterResults filters the sequences according to the
entropy thresholds set by
IThresholds and applies the
tag length cutoff
MinimumTagLength.
The algorithm works as follows: for each tag the base entropies are
searched for a sub-vector k+1:l such that
sum(entropy[n,5+k+1:l])<=IThresholds[l]
where l=MinimumTagLength. If
such a sub-vector exists, it is then extended in both direction until the
total entropy exceeds the threshold:
sum(results[n,5+k1:k2])>IThresholds[k2-k1+1].
The tag is then shortened: substr(results[n,5],k1,k2), but [ACGT]
bases to left of k1 and to the right of k2 are
added. The Barcode first bases of the tags
will always be included in a separate column if this parameter has
been set. If PET=TRUE then the whole
procedure is applied independently to each half of the sequence (and
two separate sets of tags and scores are returned) and
the barcode (if any) is assumed to be in-between the two paired tags.
Value
FilterResults returns an object suitable for
SaveResults
Author(s)
Jacques Rougemont, Arnaud Amzallag, Christian Iseli, Laurent Farinelli, Ioannis Xenarios, Felix Naef
References
Probabilistic base calling of Solexa sequencing data, BMC Bioinformatics 2008, 9:431
See Also
readFastq to read fastq files,
SeqScore and FilterResults to
produce results for SaveResults