R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Project individuals onto existing principal component axes

snpgdsPCASampLoading

R Documentation

Project individuals onto existing principal component axes

Description

To calculate the sample eigenvectors using the specified SNP loadings

Usage

snpgdsPCASampLoading(loadobj, gdsobj, sample.id=NULL, num.thread=1L,
    verbose=TRUE)

Arguments

`loadobj`	the `snpgdsPCASNPLoadingClass` object, returned from snpgdsPCASNPLoading
`gdsobj`	an object of class `SNPGDSFileClass`, a SNP GDS file
`sample.id`	a vector of sample id specifying selected samples; if NULL, all samples are used
`num.thread`	the number of CPU cores used
`verbose`	if TRUE, show information

Details

The sample.id are usually different from the samples used in the calculation of SNP loadings.

Value

Return a snpgdsPCAClass object, and it is a list:

`sample.id`	the sample ids used in the analysis
`snp.id`	the SNP ids used in the analysis
`eigenval`	eigenvalues
`eigenvect`	eigenvactors, “# of samples” x “eigen.cnt”
`TraceXTX`	the trace of the genetic covariance matrix
`Bayesian`	whether use bayerisan normalization

Author(s)

Xiuwen Zheng

References

Patterson N, Price AL, Reich D (2006) Population structure and eigenanalysis. PLoS Genetics 2:e190.

Zhu, X., Li, S., Cooper, R. S., and Elston, R. C. (2008). A unified association analysis approach for family and unrelated samples correcting for stratification. Am J Hum Genet, 82(2), 352-365.

Examples

# open an example dataset (HapMap)
genofile <- snpgdsOpen(snpgdsExampleFileName())

sample.id <- read.gdsn(index.gdsn(genofile, "sample.id"))

PCARV <- snpgdsPCA(genofile, eigen.cnt=8)
SnpLoad <- snpgdsPCASNPLoading(PCARV, genofile)

# calculate sample eigenvectors from SNP loadings
SL <- snpgdsPCASampLoading(SnpLoad, genofile, sample.id=sample.id[1:100])

diff <- PCARV$eigenvect[1:100,] - SL$eigenvect
summary(c(diff))
# ~ ZERO

# close the genotype file
snpgdsClose(genofile)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(SNPRelate)
Loading required package: gdsfmt
SNPRelate -- supported by Streaming SIMD Extensions 2 (SSE2)
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/SNPRelate/snpgdsPCASampLoading.Rd_%03d_medium.png", width=480, height=480)
> ### Name: snpgdsPCASampLoading
> ### Title: Project individuals onto existing principal component axes
> ### Aliases: snpgdsPCASampLoading
> ### Keywords: PCA GDS GWAS
> 
> ### ** Examples
> 
> # open an example dataset (HapMap)
> genofile <- snpgdsOpen(snpgdsExampleFileName())
> 
> sample.id <- read.gdsn(index.gdsn(genofile, "sample.id"))
> 
> PCARV <- snpgdsPCA(genofile, eigen.cnt=8)
Principal Component Analysis (PCA) on SNP genotypes:
Excluding 365 SNPs on non-autosomes
Excluding 1 SNP (monomorphic: TRUE, < MAF: NaN, or > missing rate: NaN)
Working space: 279 samples, 8722 SNPs
	using 1 (CPU) core
PCA:	the sum of all selected genotypes (0, 1 and 2) = 2446510
Wed Jul  6 05:34:48 2016    (internal increment: 1744)
 [>.................................................]  0%, ETC: NA     [==========>.......................................] 20%, ETC: 0s   [====================>.............................] 40%, ETC: 0s   [==============================>...................] 60%, ETC: 0s   [========================================>.........] 80%, ETC: 0s   [==================================================] 100%, ETC: 0s   [==================================================] 100%, completed  
Wed Jul  6 05:34:48 2016    Begin (eigenvalues and eigenvectors)
Wed Jul  6 05:34:48 2016    Done.
> SnpLoad <- snpgdsPCASNPLoading(PCARV, genofile)
SNP loading:
Working space: 279 samples, 8722 SNPs
	Using 1 (CPU) core.
	Using the top 8 eigenvectors.
SNP Loading:	the sum of all selected genotypes (0, 1 and 2) = 2446510
SNP Loading:	Wed Jul  6 05:34:48 2016	0%
SNP Loading:	Wed Jul  6 05:34:48 2016	100%
> 
> # calculate sample eigenvectors from SNP loadings
> SL <- snpgdsPCASampLoading(SnpLoad, genofile, sample.id=sample.id[1:100])
Sample loading:
Working space: 100 samples, 8722 SNPs
	Using 1 (CPU) core.
	Using the top 8 eigenvectors.
Sample Loading:	the sum of all selected genotypes (0, 1 and 2) = 878146
Sample Loading:	Wed Jul  6 05:34:48 2016	0%
Sample Loading:	Wed Jul  6 05:34:48 2016	100%
> 
> diff <- PCARV$eigenvect[1:100,] - SL$eigenvect
> summary(c(diff))
      Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
-1.832e-15 -6.939e-17 -9.975e-18  1.506e-17  8.327e-17  3.442e-15 
> # ~ ZERO
> 
> # close the genotype file
> snpgdsClose(genofile)
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>