Last data update: 2014.03.03

R: Compute statistics for a multiple sequence alignments
Alignment StatisticsR Documentation

Compute statistics for a multiple sequence alignments

Description

Functions to compute covariation, percent identity conservation, and percent canonical basepairs given a multiple sequence alignment and optionally a secondary structure. Statistics can be computed for a single base, basepair, helix or entire alignment.

Usage

    baseConservation(msa, pos)

    basepairConservation(msa, pos.5p, pos.3p)
    basepairCovariation(msa, pos.5p, pos.3p) 
    basepairCanonical(msa, pos.5p, pos.3p)

    helixConservation(helix, msa)
    helixCovariation(helix, msa)
    helixCanonical(helix, msa) 

    alignmentConservation(msa)
    alignmentCovariation(msa, helix)
    alignmentCanonical(msa, helix)
    
    alignmentPercentGaps(msa)

Arguments

helix

A helix data.frame

msa

A multiple sequence alignment. Can be either a Biostrings XStringSet object or a named array of strings like ones obtained from converting XStringSet with as.character.

pos, pos.5p, pos.3p

Positions of bases or basepairs for which statistics shall be calculated for.

Details

Conservation values have a range of [0, 1], where 0 is the absence of primary sequence conservation (all bases different), and 1 is full primary sequence conservation (all bases identical).

Canonical values have a range of [0, 1], where 0 is a complete lack of basepair potential, and 1 indicates that all basepairs are valid

Covariation values have a range of [-2, 2], where -2 is a complete lack of basepair potential and sequence conservation, 0 is complete sequence conservation regardless of basepairing potential, and 2 is a complete lack of sequence conservation but maintaining full basepair potential.

helix values are average of base/basepair values, and the alignment values are averages of helices or all columns depending on whether the helix argument is required.

alignmentPercentGaps simply returns the percentage of nucleotides that are gaps in a sequence for each sequence of the alignment.

Value

baseConservation, basepairConservation, basepairCovariation, basepairCanonical, alignmentConservation, alignmentCovariation, and alignmentCanonical return a single decimal value.

helixConservation, helixCovariation, helixCanonical return a list of values whose length equals the number of rows in helix.

alignmentPercentGaps returns a list of values whose length equals the number of sequences in the multiple sequence alignment.

Author(s)

Jeff Proctor, Daniel Lai

Examples

    data(helix)
    
    baseConservation(fasta, 9)

    basepairConservation(fasta, 9, 18)
    basepairCovariation(fasta, 9, 18) 
    basepairCanonical(fasta, 9, 18)

    helixConservation(helix, fasta)
    helixCovariation(helix, fasta)
    helixCanonical(helix, fasta)

    alignmentConservation(fasta)
    alignmentCovariation(fasta, helix)
    alignmentCanonical(fasta, helix)

    alignmentPercentGaps(fasta)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(R4RNA)
Loading required package: Biostrings
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: XVector
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/R4RNA/alignmentStatistics.Rd_%03d_medium.png", width=480, height=480)
> ### Name: Alignment Statistics
> ### Title: Compute statistics for a multiple sequence alignments
> ### Aliases: baseConservation basepairCanonical basepairConservation
> ###   basepairCovariation helixCanonical helixConservation helixCovariation
> ###   alignmentConservation alignmentCovariation alignmentCanonical
> ###   alignmentPercentGaps
> ### Keywords: math
> 
> ### ** Examples
> 
>     data(helix)
>     
>     baseConservation(fasta, 9)
G 
1 
> 
>     basepairConservation(fasta, 9, 18)
        G 
0.7619048 
>     basepairCovariation(fasta, 9, 18) 
       GU 
0.4761905 
>     basepairCanonical(fasta, 9, 18)
[1] 1
> 
>     helixConservation(helix, fasta)
 [1] 0.2644841 0.2857143 0.3666667 0.4285714 0.6619048 0.4583333 0.2678571
 [8] 0.2644841 0.3809524 0.7857143 0.4761905 0.5238095 0.2857143 0.3452381
[15] 0.5918367 0.5416667 0.4970238 0.4095238 0.6507937 0.5158730 0.6269841
[22] 0.7380952 0.6944444 0.6598639 0.2952381 0.7321429 0.8690476 0.3690476
[29] 0.5238095 0.5238095 0.4764286 0.4080357 0.2869048 0.5095238 0.8928571
[36] 0.5190476 0.6250000 0.5476190 0.3125000 0.4482684 0.5555556 0.7500000
[43] 0.9642857 0.8666667 0.4136364 0.8857143 0.9047619 0.8392857 0.8650794
[50] 0.8285714 0.4190476 0.8333333 0.8630952 0.8928571 0.8452381 0.5306122
>     helixCovariation(helix, fasta)
 [1]  1.03174603  1.42857143  1.05714286  1.14285714  0.67619048  0.86904762
 [7]  0.94047619  1.03174603 -1.23809524  0.42857143 -1.04761905  0.95238095
[13]  1.42857143 -0.83333333  0.81632653  0.75000000  0.93452381  0.24761905
[19]  0.60317460  0.79894180  0.63492063 -0.04761905  0.51587302  0.50340136
[25]  0.51428571  0.53571429  0.02380952  0.67460317 -0.95238095  0.95238095
[31]  0.67142857  0.96428571 -0.69047619  0.35238095  0.07142857  0.44761905
[37]  0.34523810  0.40952381  0.30357143  0.51948052  0.36507937  0.19047619
[43] -0.07142857 -0.03809524  0.51515152  0.00000000  0.04761905 -0.17857143
[49] -0.07936508 -0.11428571  0.40000000  0.00000000  0.13095238  0.07142857
[55]  0.16666667  0.13605442
>     helixCanonical(helix, fasta)
 [1] 0.9285714 1.0000000 0.9714286 1.0000000 1.0000000 0.9642857 0.9285714
 [8] 0.9285714 0.1428571 1.0000000 0.7142857 1.0000000 1.0000000 0.4285714
[15] 1.0000000 0.9642857 0.9821429 0.7714286 0.9761905 0.9682540 0.9761905
[22] 0.8571429 0.8571429 0.9591837 0.8285714 1.0000000 0.9285714 0.8809524
[29] 0.5000000 1.0000000 0.9000000 0.9642857 0.5000000 0.8285714 0.9642857
[36] 0.8857143 0.8928571 0.8857143 0.7678571 0.8441558 0.8809524 0.9142857
[43] 0.9642857 0.9142857 0.8181818 0.9428571 0.9642857 0.8571429 0.9047619
[50] 0.8571429 0.8000000 0.8928571 0.9642857 0.9642857 0.9642857 0.7959184
> 
>     alignmentConservation(fasta)
[1] 0.523439
>     alignmentCovariation(fasta, helix)
[1] 0.4796748
>     alignmentCanonical(fasta, helix)
[1] 0.902439
> 
>     alignmentPercentGaps(fasta)
AF183905.1/5647-5848 AF218039.1/6028-6228 AB017037.1/6286-6484 
          0.03809524           0.04285714           0.05238095 
AB006531.1/6003-6204 AF014388.1/6078-6278 AF022937.1/6935-7121 
          0.03809524           0.04285714           0.10952381 
AF178440.1/5925-6123 
          0.05238095 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>