R: Compute statistics for a multiple sequence alignments
Alignment Statistics
R Documentation
Compute statistics for a multiple sequence alignments
Description
Functions to compute covariation, percent identity conservation, and percent
canonical basepairs given a multiple sequence alignment and optionally a
secondary structure. Statistics can be computed for a single base,
basepair, helix or entire alignment.
A multiple sequence alignment. Can be either a BiostringsXStringSet object or a named array of strings like ones obtained
from converting XStringSet with as.character.
pos, pos.5p, pos.3p
Positions of bases or basepairs for which statistics shall be calculated
for.
Details
Conservation values have a range of [0, 1], where 0 is the absence of
primary sequence conservation (all bases different), and 1 is full
primary sequence conservation (all bases identical).
Canonical values have a range of [0, 1], where 0 is a complete lack of
basepair potential, and 1 indicates that all basepairs are valid
Covariation values have a range of [-2, 2], where -2 is a complete lack of
basepair potential and sequence conservation, 0 is complete sequence
conservation regardless of basepairing potential, and 2 is a complete lack
of sequence conservation but maintaining full basepair potential.
helix values are average of base/basepair values, and the
alignment values are averages of helices or all columns depending
on whether the helix argument is required.
alignmentPercentGaps simply returns the percentage of nucleotides
that are gaps in a sequence for each sequence of the alignment.
Value
baseConservation, basepairConservation,
basepairCovariation, basepairCanonical,
alignmentConservation, alignmentCovariation, and
alignmentCanonical
return a single decimal value.
helixConservation, helixCovariation, helixCanonical
return a list of values whose length equals the number of rows in helix.
alignmentPercentGaps returns a list of values whose length equals
the number of sequences in the multiple sequence alignment.