R: The Standard Genetic Code and its known variants
GENETIC_CODE
R Documentation
The Standard Genetic Code and its known variants
Description
Two predefined objects (GENETIC_CODE and RNA_GENETIC_CODE)
that represent The Standard Genetic Code.
Other genetic codes are stored in predefined table GENETIC_CODE_TABLE
from which they can conveniently be extracted with getGeneticCode.
Usage
## The Standard Genetic Code:
GENETIC_CODE
RNA_GENETIC_CODE
## All the known genetic codes:
GENETIC_CODE_TABLE
getGeneticCode(id_or_name2, full.search=FALSE)
Arguments
id_or_name2
A single string that uniquely identifies the genetic code to extract.
Should be one of the values in the id or name2 columns
of GENETIC_CODE_TABLE.
full.search
By default, only the id and name2 columns of
GENETIC_CODE_TABLE are searched for an exact match
with id_or_name2.
If full.search is TRUE, then the search is extended to
the name column of GENETIC_CODE_TABLE and
id_or_name2 only needs to be a substring of one of the names
in that column (also case is ignored).
Details
Formally, a genetic code is a mapping between tri-nucleotide sequences
called codons, and amino acids.
The Standard Genetic Code (aka The Canonical Genetic Code, or simply The
Genetic Code) is the particular mapping that encodes the vast majority of
genes in nature.
GENETIC_CODE and RNA_GENETIC_CODE are predefined named
character vectors that represent this mapping.
All the known genetic codes are summarized in GENETIC_CODE_TABLE,
which is a predefined data frame with 1 row per known genetic code.
Use getGeneticCode to extract one genetic code at a time from
this object.
Value
GENETIC_CODE and RNA_GENETIC_CODE are both named character
vectors of length 64 (the number of all possible tri-nucleotide sequences)
where each element is a single letter representing either an amino acid
or the stop codon "*" (aka termination codon).
The names of the GENETIC_CODE vector are the DNA codons i.e. the
tri-nucleotide sequences (directed 5' to 3') that are assumed to belong
to the "coding DNA strand" (aka "sense DNA strand" or "non-template DNA
strand") of the gene.
The names of the RNA_GENETIC_CODE are the RNA codons i.e. the
tri-nucleotide sequences (directed 5' to 3') that are assumed to belong
to the mRNA of the gene.
Note that the values in the GENETIC_CODE and RNA_GENETIC_CODE
vectors are the same, only their names are different. The names of the
latter are those of the former where all occurrences of T (thymine) have
been replaced by U (uracil).
GENETIC_CODE_TABLE is a data frame with 1 row per known genetic code
and the 4 following columns:
name: The long and very descriptive name of the genetic code.
name2: The short name of the genetic code (not all genetic
codes have one).
id: The id of the genetic code.
AAs: The genetic code itself represented in a compact form
(i.e. 64 amino acid letters, 1 letter per codon, the codons are
assumed to be ordered like in GENETIC_CODE).
getGeneticCode returns a named character vector of length 64
similar to GENETIC_CODE i.e. it contains 1-letter strings
in the Amino Acid alphabet (see ?AA_ALPHABET) and its names are
identical to names(GENETIC_CODE).
The "official names" of the various codes ("Standard", "SGC0",
"Vertebrate Mitochondrial", "SGC1", etc..) and their ids (1, 2, etc...)
were taken from the print-form ASN.1 version of the above document
(version 3.9 at the time of this writting):
The translate and trinucleotideFrequency
functions.
DNAString, RNAString, and AAString objects.
Examples
## The Standard Genetic Code:
GENETIC_CODE
GENETIC_CODE[["ATG"]] # codon ATG is translated into M (Methionine)
sort(table(GENETIC_CODE)) # the same amino acid can be encoded by 1
# to 6 different codons
RNA_GENETIC_CODE
all(GENETIC_CODE == RNA_GENETIC_CODE) # TRUE
## All the known genetic codes:
GENETIC_CODE_TABLE[1:3 , ]
getGeneticCode("SGC0") # The Standard Genetic Code, again
stopifnot(identical(getGeneticCode("SGC0"), GENETIC_CODE))
getGeneticCode("SGC1") # Vertebrate Mitochondrial
getGeneticCode("ascidian", full.search=TRUE) # Ascidian Mitochondrial
## Differences between a non-standard code and the Standard Code:
idx <- which(getGeneticCode("SGC1") != GENETIC_CODE)
rbind(SGC1=getGeneticCode("SGC1")[idx], Standard=GENETIC_CODE[idx])
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(Biostrings)
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: XVector
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/Biostrings/GENETIC_CODE.Rd_%03d_medium.png", width=480, height=480)
> ### Name: GENETIC_CODE
> ### Title: The Standard Genetic Code and its known variants
> ### Aliases: GENETIC_CODE RNA_GENETIC_CODE GENETIC_CODE_TABLE
> ### getGeneticCode
> ### Keywords: utilities data
>
> ### ** Examples
>
> ## The Standard Genetic Code:
>
> GENETIC_CODE
TTT TTC TTA TTG TCT TCC TCA TCG TAT TAC TAA TAG TGT TGC TGA TGG CTT CTC CTA CTG
"F" "F" "L" "L" "S" "S" "S" "S" "Y" "Y" "*" "*" "C" "C" "*" "W" "L" "L" "L" "L"
CCT CCC CCA CCG CAT CAC CAA CAG CGT CGC CGA CGG ATT ATC ATA ATG ACT ACC ACA ACG
"P" "P" "P" "P" "H" "H" "Q" "Q" "R" "R" "R" "R" "I" "I" "I" "M" "T" "T" "T" "T"
AAT AAC AAA AAG AGT AGC AGA AGG GTT GTC GTA GTG GCT GCC GCA GCG GAT GAC GAA GAG
"N" "N" "K" "K" "S" "S" "R" "R" "V" "V" "V" "V" "A" "A" "A" "A" "D" "D" "E" "E"
GGT GGC GGA GGG
"G" "G" "G" "G"
>
> GENETIC_CODE[["ATG"]] # codon ATG is translated into M (Methionine)
[1] "M"
>
> sort(table(GENETIC_CODE)) # the same amino acid can be encoded by 1
GENETIC_CODE
M W C D E F H K N Q Y * I A G P T V L R S
1 1 2 2 2 2 2 2 2 2 2 3 3 4 4 4 4 4 6 6 6
> # to 6 different codons
>
> RNA_GENETIC_CODE
UUU UUC UUA UUG UCU UCC UCA UCG UAU UAC UAA UAG UGU UGC UGA UGG CUU CUC CUA CUG
"F" "F" "L" "L" "S" "S" "S" "S" "Y" "Y" "*" "*" "C" "C" "*" "W" "L" "L" "L" "L"
CCU CCC CCA CCG CAU CAC CAA CAG CGU CGC CGA CGG AUU AUC AUA AUG ACU ACC ACA ACG
"P" "P" "P" "P" "H" "H" "Q" "Q" "R" "R" "R" "R" "I" "I" "I" "M" "T" "T" "T" "T"
AAU AAC AAA AAG AGU AGC AGA AGG GUU GUC GUA GUG GCU GCC GCA GCG GAU GAC GAA GAG
"N" "N" "K" "K" "S" "S" "R" "R" "V" "V" "V" "V" "A" "A" "A" "A" "D" "D" "E" "E"
GGU GGC GGA GGG
"G" "G" "G" "G"
> all(GENETIC_CODE == RNA_GENETIC_CODE) # TRUE
[1] TRUE
>
> ## All the known genetic codes:
>
> GENETIC_CODE_TABLE[1:3 , ]
name name2 id
1 Standard SGC0 1
2 Vertebrate Mitochondrial SGC1 2
3 Yeast Mitochondrial SGC2 3
AAs
1 FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG
2 FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSS**VVVVAAAADDEEGGGG
3 FFLLSSSSYY**CCWWTTTTPPPPHHQQRRRRIIMMTTTTNNKKSSRRVVVVAAAADDEEGGGG
>
> getGeneticCode("SGC0") # The Standard Genetic Code, again
TTT TTC TTA TTG TCT TCC TCA TCG TAT TAC TAA TAG TGT TGC TGA TGG CTT CTC CTA CTG
"F" "F" "L" "L" "S" "S" "S" "S" "Y" "Y" "*" "*" "C" "C" "*" "W" "L" "L" "L" "L"
CCT CCC CCA CCG CAT CAC CAA CAG CGT CGC CGA CGG ATT ATC ATA ATG ACT ACC ACA ACG
"P" "P" "P" "P" "H" "H" "Q" "Q" "R" "R" "R" "R" "I" "I" "I" "M" "T" "T" "T" "T"
AAT AAC AAA AAG AGT AGC AGA AGG GTT GTC GTA GTG GCT GCC GCA GCG GAT GAC GAA GAG
"N" "N" "K" "K" "S" "S" "R" "R" "V" "V" "V" "V" "A" "A" "A" "A" "D" "D" "E" "E"
GGT GGC GGA GGG
"G" "G" "G" "G"
> stopifnot(identical(getGeneticCode("SGC0"), GENETIC_CODE))
>
> getGeneticCode("SGC1") # Vertebrate Mitochondrial
TTT TTC TTA TTG TCT TCC TCA TCG TAT TAC TAA TAG TGT TGC TGA TGG CTT CTC CTA CTG
"F" "F" "L" "L" "S" "S" "S" "S" "Y" "Y" "*" "*" "C" "C" "W" "W" "L" "L" "L" "L"
CCT CCC CCA CCG CAT CAC CAA CAG CGT CGC CGA CGG ATT ATC ATA ATG ACT ACC ACA ACG
"P" "P" "P" "P" "H" "H" "Q" "Q" "R" "R" "R" "R" "I" "I" "M" "M" "T" "T" "T" "T"
AAT AAC AAA AAG AGT AGC AGA AGG GTT GTC GTA GTG GCT GCC GCA GCG GAT GAC GAA GAG
"N" "N" "K" "K" "S" "S" "*" "*" "V" "V" "V" "V" "A" "A" "A" "A" "D" "D" "E" "E"
GGT GGC GGA GGG
"G" "G" "G" "G"
>
> getGeneticCode("ascidian", full.search=TRUE) # Ascidian Mitochondrial
TTT TTC TTA TTG TCT TCC TCA TCG TAT TAC TAA TAG TGT TGC TGA TGG CTT CTC CTA CTG
"F" "F" "L" "L" "S" "S" "S" "S" "Y" "Y" "*" "*" "C" "C" "W" "W" "L" "L" "L" "L"
CCT CCC CCA CCG CAT CAC CAA CAG CGT CGC CGA CGG ATT ATC ATA ATG ACT ACC ACA ACG
"P" "P" "P" "P" "H" "H" "Q" "Q" "R" "R" "R" "R" "I" "I" "M" "M" "T" "T" "T" "T"
AAT AAC AAA AAG AGT AGC AGA AGG GTT GTC GTA GTG GCT GCC GCA GCG GAT GAC GAA GAG
"N" "N" "K" "K" "S" "S" "G" "G" "V" "V" "V" "V" "A" "A" "A" "A" "D" "D" "E" "E"
GGT GGC GGA GGG
"G" "G" "G" "G"
>
> ## Differences between a non-standard code and the Standard Code:
> idx <- which(getGeneticCode("SGC1") != GENETIC_CODE)
> rbind(SGC1=getGeneticCode("SGC1")[idx], Standard=GENETIC_CODE[idx])
TGA ATA AGA AGG
SGC1 "W" "M" "*" "*"
Standard "*" "I" "R" "R"
>
>
>
>
>
> dev.off()
null device
1
>