Last data update: 2014.03.03

Alignment Statistics

Compute statistics for a multiple sequence alignments


Functions to compute covariation, percent identity conservation, and percent canonical basepairs given a multiple sequence alignment and optionally a secondary structure. Statistics can be computed for a single base, basepair, helix or entire alignment.


    baseConservation(msa, pos)

    basepairConservation(msa, pos.5p, pos.3p)
    basepairCovariation(msa, pos.5p, pos.3p) 
    basepairCanonical(msa, pos.5p, pos.3p)

    helixConservation(helix, msa)
    helixCovariation(helix, msa)
    helixCanonical(helix, msa) 

    alignmentCovariation(msa, helix)
    alignmentCanonical(msa, helix)



A helix data.frame


A multiple sequence alignment. Can be either a Biostrings XStringSet object or a named array of strings like ones obtained from converting XStringSet with as.character.

pos, pos.5p, pos.3p

Positions of bases or basepairs for which statistics shall be calculated for.


Conservation values have a range of [0, 1], where 0 is the absence of primary sequence conservation (all bases different), and 1 is full primary sequence conservation (all bases identical).

Canonical values have a range of [0, 1], where 0 is a complete lack of basepair potential, and 1 indicates that all basepairs are valid

Covariation values have a range of [-2, 2], where -2 is a complete lack of basepair potential and sequence conservation, 0 is complete sequence conservation regardless of basepairing potential, and 2 is a complete lack of sequence conservation but maintaining full basepair potential.

helix values are average of base/basepair values, and the alignment values are averages of helices or all columns depending on whether the helix argument is required.

alignmentPercentGaps simply returns the percentage of nucleotides that are gaps in a sequence for each sequence of the alignment.


baseConservation, basepairConservation, basepairCovariation, basepairCanonical, alignmentConservation, alignmentCovariation, and alignmentCanonical return a single decimal value.

helixConservation, helixCovariation, helixCanonical return a list of values whose length equals the number of rows in helix.

alignmentPercentGaps returns a list of values whose length equals the number of sequences in the multiple sequence alignment.


Jeff Proctor, Daniel Lai


    baseConservation(fasta, 9)

    basepairConservation(fasta, 9, 18)
    basepairCovariation(fasta, 9, 18) 
    basepairCanonical(fasta, 9, 18)

    helixConservation(helix, fasta)
    helixCovariation(helix, fasta)
    helixCanonical(helix, fasta)

    alignmentCovariation(fasta, helix)
    alignmentCanonical(fasta, helix)



>     data(helix)
>     baseConservation(fasta, 9)
>     basepairConservation(fasta, 9, 18)
>     basepairCovariation(fasta, 9, 18) 
>     basepairCanonical(fasta, 9, 18)
[1] 1
>     helixConservation(helix, fasta)
 [1] 0.2644841 0.2857143 0.3666667 0.4285714 0.6619048 0.4583333 0.2678571
 [8] 0.2644841 0.3809524 0.7857143 0.4761905 0.5238095 0.2857143 0.3452381
[15] 0.5918367 0.5416667 0.4970238 0.4095238 0.6507937 0.5158730 0.6269841
[22] 0.7380952 0.6944444 0.6598639 0.2952381 0.7321429 0.8690476 0.3690476
[29] 0.5238095 0.5238095 0.4764286 0.4080357 0.2869048 0.5095238 0.8928571
[36] 0.5190476 0.6250000 0.5476190 0.3125000 0.4482684 0.5555556 0.7500000
[43] 0.9642857 0.8666667 0.4136364 0.8857143 0.9047619 0.8392857 0.8650794
[50] 0.8285714 0.4190476 0.8333333 0.8630952 0.8928571 0.8452381 0.5306122
>     helixCovariation(helix, fasta)
 [1]  1.03174603  1.42857143  1.05714286  1.14285714  0.67619048  0.86904762
 [7]  0.94047619  1.03174603 -1.23809524  0.42857143 -1.04761905  0.95238095
[13]  1.42857143 -0.83333333  0.81632653  0.75000000  0.93452381  0.24761905
[19]  0.60317460  0.79894180  0.63492063 -0.04761905  0.51587302  0.50340136
[25]  0.51428571  0.53571429  0.02380952  0.67460317 -0.95238095  0.95238095
[31]  0.67142857  0.96428571 -0.69047619  0.35238095  0.07142857  0.44761905
[37]  0.34523810  0.40952381  0.30357143  0.51948052  0.36507937  0.19047619
[43] -0.07142857 -0.03809524  0.51515152  0.00000000  0.04761905 -0.17857143
[49] -0.07936508 -0.11428571  0.40000000  0.00000000  0.13095238  0.07142857
[55]  0.16666667  0.13605442
>     helixCanonical(helix, fasta)
 [1] 0.9285714 1.0000000 0.9714286 1.0000000 1.0000000 0.9642857 0.9285714
 [8] 0.9285714 0.1428571 1.0000000 0.7142857 1.0000000 1.0000000 0.4285714
[15] 1.0000000 0.9642857 0.9821429 0.7714286 0.9761905 0.9682540 0.9761905
[22] 0.8571429 0.8571429 0.9591837 0.8285714 1.0000000 0.9285714 0.8809524
[29] 0.5000000 1.0000000 0.9000000 0.9642857 0.5000000 0.8285714 0.9642857
[36] 0.8857143 0.8928571 0.8857143 0.7678571 0.8441558 0.8809524 0.9142857
[43] 0.9642857 0.9142857 0.8181818 0.9428571 0.9642857 0.8571429 0.9047619
[50] 0.8571429 0.8000000 0.8928571 0.9642857 0.9642857 0.9642857 0.7959184
>     alignmentConservation(fasta)
[1] 0.523439
>     alignmentCovariation(fasta, helix)
[1] 0.4796748
>     alignmentCanonical(fasta, helix)
[1] 0.902439
>     alignmentPercentGaps(fasta)
AF183905.1/5647-5848 AF218039.1/6028-6228 AB017037.1/6286-6484 
          0.03809524           0.04285714           0.05238095 
AB006531.1/6003-6204 AF014388.1/6078-6278 AF022937.1/6935-7121 
          0.03809524           0.04285714           0.10952381 
null device 