R: function to calculate the identity-by-state stats of a group...
ibs.stats
R Documentation
function to calculate the identity-by-state stats of a group
of samples
Description
Given a snp.matrix-class or a X.snp.matrix-class object
with $N$ samples, calculates some statistics about the relatedness of
every pair of samples within.
Usage
ibs.stats(x)
Arguments
x
a snp.matrix-class or a X.snp.matrix-class
object containing $N$ samples
Details
No-calls are excluded from consideration here.
Value
A data.frame containing $N (N-1)/2$ rows, where the row names are the
sample name pairs separated by a comma, and the columns are:
Count
count of identical calls, exclusing no-calls
Fraction
fraction of identical calls comparied to actual
calls being made in both samples
Warning
In some applications, it may be preferable to
subset a (random) selection of SNPs first - the
calculation time increases as $N (N-1) M /2$ . Typically for N = 800
samples and M = 3000 SNPs, the calculation time is about 1 minute. A
full GWA scan could take hours, and quite unnecessary for simple
applications such as checking for duplicate or related samples.
Note
This is mostly written to find mislabelled and/or duplicate samples.
Illumina indexes their SNPs in alphabetical order so the
mitochondria SNPs comes first - for most purpose it is undesirable
to use these SNPs for IBS purposes.
TODO: Worst-case S4 subsetting seems to make 2 copies of a large object,
so one might want to subset before rbind(), etc; a future version
of this routine may contain a built-in subsetting facility to work
around that limitation.
data(testdata)
result <- ibs.stats(Autosomes[11:20,])
summary(result)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(chopsticks)
Loading required package: survival
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/chopsticks/ibs.stats.Rd_%03d_medium.png", width=480, height=480)
> ### Name: ibs.stats
> ### Title: function to calculate the identity-by-state stats of a group of
> ### samples
> ### Aliases: ibs.stats
> ### Keywords: utilities
>
> ### ** Examples
>
> data(testdata)
> result <- ibs.stats(Autosomes[11:20,])
Information: samples = 10, snps = 9445
> summary(result)
Count Fraction
Min. :3909 Min. :0.6164
1st Qu.:4014 1st Qu.:0.6265
Median :4087 Median :0.6305
Mean :4397 Mean :0.6303
3rd Qu.:4176 3rd Qu.:0.6347
Max. :5762 Max. :0.6418
>
>
>
>
>
> dev.off()
null device
1
>