Last data update: 2014.03.03

R: Clonality
clonalityR Documentation

Clonality

Description

Creates a data frame giving the total number of sequences, number of unique productive sequences, number of genomes, entropy, clonality, Gini coefficient, and the frequency (%) of the top productive sequences in a list of sample data frames.

Usage

clonality(file.list)

Arguments

file.list

A list of data frames consisting of antigen receptor sequencing imported by the LymphoSeq function readImmunoSeq. "aminoAcid", "count", "frequencyCount", and "estimatedNumberGenomes" are required columns. Note that the function is not intended to be run using a productive sequence list generated by the function productiveSeq.

Details

Clonality is derived from the Shannon entropy, which is calculated from the frequencies of all productive sequences divided by the logarithm of the total number of unique productive sequences. This normalized entropy value is then inverted (1 - normalized entropy) to produce the clonality metric.

The Gini coefficient is an alternative metric used to calculate repertoire diversity and is derived from the Lorenz curve. The Lorenz curve is drawn such that x-axis represents the cumulative percentage of unique sequences and the y-axis represents the cumulative percentage of reads. A line passing through the origin with a slope of 1 reflects equal frequencies of all clones. The Gini coefficient is the ratio of the area between the line of equality and the observed Lorenz curve over the total area under the line of equality. Both Gini coefficient and clonality are reported on a scale from 0 to 1 where 0 indicates all sequences have the same frequency and 1 indicates the repertoire is dominated by a single sequence.

Value

Returns a data frame giving the total number of sequences, number of unique productive sequences, number of genomes, entropy, clonality, Gini coefficient, and the frequency (%) of the top productive sequence in each sample.

See Also

lorenzCurve

Examples

file.path <- system.file("extdata", "TCRB_sequencing", package = "LymphoSeq")

file.list <- readImmunoSeq(path = file.path)

clonality(file.list = file.list)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(LymphoSeq)
Loading required package: LymphoSeqDB
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/LymphoSeq/clonality.Rd_%03d_medium.png", width=480, height=480)
> ### Name: clonality
> ### Title: Clonality
> ### Aliases: clonality
> 
> ### ** Examples
> 
> file.path <- system.file("extdata", "TCRB_sequencing", package = "LymphoSeq")
> 
> file.list <- readImmunoSeq(path = file.path)
   |                                                                               |                                                                      |   0%   |                                                                               |======                                                                |   9%   |                                                                               |=============                                                         |  18%   |                                                                               |===================                                                   |  27%   |                                                                               |=========================                                             |  36%   |                                                                               |================================                                      |  45%   |                                                                               |======================================                                |  55%   |                                                                               |=============================================                         |  64%   |                                                                               |===================================================                   |  73%   |                                                                               |=========================================================             |  82%   |                                                                               |================================================================      |  91%   |                                                                               |======================================================================| 100%
> 
> clonality(file.list = file.list)
                 samples totalSequences uniqueProductiveSequences totalGenomes
1        TCRB_Day949_CD4            999                       845        25767
2        TCRB_Day949_CD8            999                       796        26236
3     TCRB_Day0_Unsorted            999                       837        18215
4     TCRB_Day83_CD8_CMV            201                       122          254
5    TCRB_Day32_Unsorted            920                       767           NA
6    TCRB_Day369_CD8_CMV            414                       281         1794
7    TCRB_Day83_Unsorted            999                       830           NA
8   TCRB_Day1320_CD8_CMV             40                        25           53
9   TCRB_Day369_Unsorted            999                       828           NA
10  TCRB_Day949_Unsorted            999                       831         6547
11 TCRB_Day1320_Unsorted            999                       833       180079
   totalCount  entropy  clonality giniCoefficient topProductiveSequence
1     1795561 5.597259 0.42431656       0.8486265             29.232143
2     2161314 5.535828 0.42554289       0.9018374             19.131268
3     1158510 7.144318 0.26416150       0.7956718              5.543769
4        4553 5.891883 0.14989093       0.6133629              8.917656
5       31078 8.296630 0.13424209       0.6007820              4.865016
6       52480 5.011083 0.38396606       0.8677666             17.718550
7      427427 7.285945 0.24863671       0.7100983             13.821673
8          53 4.486348 0.03391748       0.2313514             10.810811
9      725668 6.037979 0.37710971       0.8043414             17.568430
10    1486480 6.774799 0.30147383       0.7739322             13.510800
11     180079 5.708193 0.41165830       0.8845652             14.422142
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>