R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Individual dissimilarity analysis

snpgdsDiss

R Documentation

Individual dissimilarity analysis

Description

Calculate the individual dissimilarities for each pair of individuals.

Usage

snpgdsDiss(gdsobj, sample.id=NULL, snp.id=NULL, autosome.only=TRUE,
    remove.monosnp=TRUE, maf=NaN, missing.rate=NaN, num.thread=1, verbose=TRUE)

Arguments

`gdsobj`	an object of class `SNPGDSFileClass`, a SNP GDS file
`sample.id`	a vector of sample id specifying selected samples; if NULL, all samples are used
`snp.id`	a vector of snp id specifying selected SNPs; if NULL, all SNPs are used
`autosome.only`	if `TRUE`, use autosomal SNPs only; if it is a numeric or character value, keep SNPs according to the specified chromosome
`remove.monosnp`	if TRUE, remove monomorphic SNPs
`maf`	to use the SNPs with ">= maf" only; if NaN, no MAF threshold
`missing.rate`	to use the SNPs with "<= missing.rate" only; if NaN, no missing threshold
`num.thread`	the number of (CPU) cores used; if `NA`, detect the number of cores automatically
`verbose`	if TRUE, show information

Details

The minor allele frequency and missing rate for each SNP passed in snp.id are calculated over all the samples in sample.id.

The details will be described in future.

Value

Return a class "snpgdsDissClass":

`sample.id`	the sample ids used in the analysis
`snp.id`	the SNP ids used in the analysis
`diss`	a matrix of individual dissimilarity

Author(s)

Xiuwen Zheng

References

Zheng, Xiuwen. 2013. Statistical Prediction of HLA Alleles and Relatedness Analysis in Genome-Wide Association Studies. PhD dissertation, the department of Biostatistics, University of Washington.

Weir BS, Zheng X. SNPs and SNVs in Forensic Science. 2015. Forensic Science International: Genetics Supplement Series.

Examples

# open an example dataset (HapMap)
genofile <- snpgdsOpen(snpgdsExampleFileName())

pop.group <- as.factor(read.gdsn(index.gdsn(
    genofile, "sample.annot/pop.group")))
pop.level <- levels(pop.group)

diss <- snpgdsDiss(genofile)
hc <- snpgdsHCluster(diss)

# close the genotype file
snpgdsClose(genofile)


# split
set.seed(100)
rv <- snpgdsCutTree(hc, label.H=TRUE, label.Z=TRUE)

# draw dendrogram
snpgdsDrawTree(rv, main="HapMap Phase II",
    edgePar=list(col=rgb(0.5,0.5,0.5, 0.75), t.col="black"))

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(SNPRelate)
Loading required package: gdsfmt
SNPRelate -- supported by Streaming SIMD Extensions 2 (SSE2)
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/SNPRelate/snpgdsDiss.Rd_%03d_medium.png", width=480, height=480)
> ### Name: snpgdsDiss
> ### Title: Individual dissimilarity analysis
> ### Aliases: snpgdsDiss
> ### Keywords: GDS GWAS
> 
> ### ** Examples
> 
> # open an example dataset (HapMap)
> genofile <- snpgdsOpen(snpgdsExampleFileName())
> 
> pop.group <- as.factor(read.gdsn(index.gdsn(
+     genofile, "sample.annot/pop.group")))
> pop.level <- levels(pop.group)
> 
> diss <- snpgdsDiss(genofile)
Individual dissimilarity analysis on SNP genotypes:
Excluding 365 SNPs on non-autosomes
Excluding 1 SNP (monomorphic: TRUE, < MAF: NaN, or > missing rate: NaN)
Working space: 279 samples, 8722 SNPs
	using 1 (CPU) core
Dissimilarity:	the sum of all selected genotypes (0, 1 and 2) = 2446510
Dissimilarity:	Wed Jul  6 05:34:33 2016	0%
Dissimilarity:	Wed Jul  6 05:34:34 2016	100%
> hc <- snpgdsHCluster(diss)
> 
> # close the genotype file
> snpgdsClose(genofile)
> 
> 
> # split
> set.seed(100)
> rv <- snpgdsCutTree(hc, label.H=TRUE, label.Z=TRUE)
Determine groups by permutation (Z threshold: 15, outlier threshold: 5):
Create 3 groups.
> 
> # draw dendrogram
> snpgdsDrawTree(rv, main="HapMap Phase II",
+     edgePar=list(col=rgb(0.5,0.5,0.5, 0.75), t.col="black"))
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>