Last data update: 2014.03.03

R: Project individuals onto existing principal component axes
snpgdsPCASampLoadingR Documentation

Project individuals onto existing principal component axes

Description

To calculate the sample eigenvectors using the specified SNP loadings

Usage

snpgdsPCASampLoading(loadobj, gdsobj, sample.id=NULL, num.thread=1L,
    verbose=TRUE)

Arguments

loadobj

the snpgdsPCASNPLoadingClass object, returned from snpgdsPCASNPLoading

gdsobj

an object of class SNPGDSFileClass, a SNP GDS file

sample.id

a vector of sample id specifying selected samples; if NULL, all samples are used

num.thread

the number of CPU cores used

verbose

if TRUE, show information

Details

The sample.id are usually different from the samples used in the calculation of SNP loadings.

Value

Return a snpgdsPCAClass object, and it is a list:

sample.id

the sample ids used in the analysis

snp.id

the SNP ids used in the analysis

eigenval

eigenvalues

eigenvect

eigenvactors, “# of samples” x “eigen.cnt”

TraceXTX

the trace of the genetic covariance matrix

Bayesian

whether use bayerisan normalization

Author(s)

Xiuwen Zheng

References

Patterson N, Price AL, Reich D (2006) Population structure and eigenanalysis. PLoS Genetics 2:e190.

Zhu, X., Li, S., Cooper, R. S., and Elston, R. C. (2008). A unified association analysis approach for family and unrelated samples correcting for stratification. Am J Hum Genet, 82(2), 352-365.

See Also

snpgdsPCA, snpgdsPCACorr, snpgdsPCASNPLoading

Examples

# open an example dataset (HapMap)
genofile <- snpgdsOpen(snpgdsExampleFileName())

sample.id <- read.gdsn(index.gdsn(genofile, "sample.id"))

PCARV <- snpgdsPCA(genofile, eigen.cnt=8)
SnpLoad <- snpgdsPCASNPLoading(PCARV, genofile)

# calculate sample eigenvectors from SNP loadings
SL <- snpgdsPCASampLoading(SnpLoad, genofile, sample.id=sample.id[1:100])

diff <- PCARV$eigenvect[1:100,] - SL$eigenvect
summary(c(diff))
# ~ ZERO

# close the genotype file
snpgdsClose(genofile)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(SNPRelate)
Loading required package: gdsfmt
SNPRelate -- supported by Streaming SIMD Extensions 2 (SSE2)
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/SNPRelate/snpgdsPCASampLoading.Rd_%03d_medium.png", width=480, height=480)
> ### Name: snpgdsPCASampLoading
> ### Title: Project individuals onto existing principal component axes
> ### Aliases: snpgdsPCASampLoading
> ### Keywords: PCA GDS GWAS
> 
> ### ** Examples
> 
> # open an example dataset (HapMap)
> genofile <- snpgdsOpen(snpgdsExampleFileName())
> 
> sample.id <- read.gdsn(index.gdsn(genofile, "sample.id"))
> 
> PCARV <- snpgdsPCA(genofile, eigen.cnt=8)
Principal Component Analysis (PCA) on SNP genotypes:
Excluding 365 SNPs on non-autosomes
Excluding 1 SNP (monomorphic: TRUE, < MAF: NaN, or > missing rate: NaN)
Working space: 279 samples, 8722 SNPs
	using 1 (CPU) core
PCA:	the sum of all selected genotypes (0, 1 and 2) = 2446510
Wed Jul  6 05:34:48 2016    (internal increment: 1744)
 [>.................................................]  0%, ETC: NA     [==========>.......................................] 20%, ETC: 0s   [====================>.............................] 40%, ETC: 0s   [==============================>...................] 60%, ETC: 0s   [========================================>.........] 80%, ETC: 0s   [==================================================] 100%, ETC: 0s   [==================================================] 100%, completed  
Wed Jul  6 05:34:48 2016    Begin (eigenvalues and eigenvectors)
Wed Jul  6 05:34:48 2016    Done.
> SnpLoad <- snpgdsPCASNPLoading(PCARV, genofile)
SNP loading:
Working space: 279 samples, 8722 SNPs
	Using 1 (CPU) core.
	Using the top 8 eigenvectors.
SNP Loading:	the sum of all selected genotypes (0, 1 and 2) = 2446510
SNP Loading:	Wed Jul  6 05:34:48 2016	0%
SNP Loading:	Wed Jul  6 05:34:48 2016	100%
> 
> # calculate sample eigenvectors from SNP loadings
> SL <- snpgdsPCASampLoading(SnpLoad, genofile, sample.id=sample.id[1:100])
Sample loading:
Working space: 100 samples, 8722 SNPs
	Using 1 (CPU) core.
	Using the top 8 eigenvectors.
Sample Loading:	the sum of all selected genotypes (0, 1 and 2) = 878146
Sample Loading:	Wed Jul  6 05:34:48 2016	0%
Sample Loading:	Wed Jul  6 05:34:48 2016	100%
> 
> diff <- PCARV$eigenvect[1:100,] - SL$eigenvect
> summary(c(diff))
      Min.    1st Qu.     Median       Mean    3rd Qu.       Max. 
-1.832e-15 -6.939e-17 -9.975e-18  1.506e-17  8.327e-17  3.442e-15 
> # ~ ZERO
> 
> # close the genotype file
> snpgdsClose(genofile)
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>