Last data update: 2014.03.03

R: function to calculate the identity-by-state stats of a group...
ibs.statsR Documentation

function to calculate the identity-by-state stats of a group of samples


Given a snp.matrix-class or a X.snp.matrix-class object with $N$ samples, calculates some statistics about the relatedness of every pair of samples within.





a snp.matrix-class or a X.snp.matrix-class object containing $N$ samples


No-calls are excluded from consideration here.


A data.frame containing $N (N-1)/2$ rows, where the row names are the sample name pairs separated by a comma, and the columns are:


count of identical calls, exclusing no-calls


fraction of identical calls comparied to actual calls being made in both samples


In some applications, it may be preferable to subset a (random) selection of SNPs first - the calculation time increases as $N (N-1) M /2$ . Typically for N = 800 samples and M = 3000 SNPs, the calculation time is about 1 minute. A full GWA scan could take hours, and quite unnecessary for simple applications such as checking for duplicate or related samples.


This is mostly written to find mislabelled and/or duplicate samples.

Illumina indexes their SNPs in alphabetical order so the mitochondria SNPs comes first - for most purpose it is undesirable to use these SNPs for IBS purposes.

TODO: Worst-case S4 subsetting seems to make 2 copies of a large object, so one might want to subset before rbind(), etc; a future version of this routine may contain a built-in subsetting facility to work around that limitation.


Hin-Tak Leung


result <- ibs.stats(Autosomes[11:20,])


> data(testdata)
> result <- ibs.stats(Autosomes[11:20,])
Information: samples = 10, snps = 9445
> summary(result)
     Count         Fraction     
 Min.   :3909   Min.   :0.6164  
 1st Qu.:4014   1st Qu.:0.6265  
 Median :4087   Median :0.6305  
 Mean   :4397   Mean   :0.6303  
 3rd Qu.:4176   3rd Qu.:0.6347  
 Max.   :5762   Max.   :0.6418  
