Last data update: 2014.03.03

R: ARTP test for raw data
rARTPR Documentation

ARTP test for raw data

Description

Calculate gene and pathway p-values using the ARTP test and raw genotype data

Usage

rARTP(formula, data, pathway, family, geno.files = NULL, lambda = 1.0, 
      subset = NULL, options = NULL)

Arguments

formula

an object of class formula: a symbolic description of basic risk model to be fitted. Only the outcome and covariates are included. See more details of formula in glm.

data

a data frame containing the variables specified in formula. If geno.files is not NULL, then it also contains genotypes.

pathway

a character of the name of file containing definition of a pathway. It must be able to be read by read.table and have columns called SNP, Gene, Chr. It also can be a data frame with the three columns.

family

a character taking values of 'gaussian' or 'binomial'.

geno.files

a character vector containing paths of plain text files containing the genotype data. Those files can be compressed as gz files and are able to be read by read.table. It can be a data frame with columns bed, bim, and fam. The data frame contains paths of (multiple sets of) PLINK files containing the genotype data. It can be NULL if all genotype data are put in data.

lambda

a numeric specifying inflation factor. The default is 1.0.

subset

an optional integer vector specifying a subset of observations in data. The default is NULL, i.e., all observations are used.

options

a list of options to control the test procedure. If NULL, default options will be used. See options.

Details

This function computes gene and pathway p-values when raw genotype data is available. The ARTP test modified from Yu et al. (2009) and AdaJoint test from Zhang et al. (2014) are released with this package. ARTP is the Adaptive Rank Truncated Product test.

Value

rARTP returns an object of class ARTP2. It is a list containing the following components:

pathway.pvalue

final pathway p-value accounting for multiple comparisons.

gene.pvalue

a data frame containing gene name, number of SNPs in the gene that were included in the analysis, chromosome name, and the p-value for the gene accounting for multiple comparisons.

pathway

a data frame defining the pathway that was actually tested after various filters applied.

model

a list containing detailed information of selected SNPs in each gene.

most.sig.genes

a character vector of genes selected by ARTP2. They are the most promising candidates, although their statistical significance is not guaranteed.

deleted.snps

a data frame containing SNPs excluded from the analysis and their reasons.

deleted.genes

a data frame containing genes excluded from the analysis because they are subsets of other remaining genes. Set options$rm.gene.subset to be FALSE to include all genes even if they are subsets of other genes.

options

a list of options used in the analysis. See options.

accurate

TRUE if options$nperm is large enougth to accurately estimate p-values, i.e., if the criteria sqrt(pvalue*(1-pvalue)/nperm)/pvalue < 0.1 is satisfied.

setup

a list containing necessary input for warm.start. It can be written to a file by using the function save, then its path can be the input of warm.start. It also contains a data frame of outcome and covariates that are specified in formula (setup$yx), a data frame of genotypes of SNPs in pathway (setup$raw.geno), and a formula object setup$formula corresponding to setup$yx, if options$keep.geno is TRUE.

References

Yu K, Li Q, Bergen AW, Pfeiffer RM, Rosenberg PS, Caporaso N, Kraft P, Chatterjee N. (2009) Pathway analysis by adaptive combination of P-values. Genet Epidemiol 33(8): 700 - 709

Zhang H, Shi J, Liang F, Wheeler W, Stolzenberg-Solomon R, Yu K. (2014) A fast multilocus test with adaptive SNP selection for large-scale genetic association studies. European Journal of Human Genetics, 22, 696 - 702

See Also

options, warm.start, sARTP, example.

Examples


library(ARTP2)

## Load the sample data
data(data, package = "ARTP2")
head(data[, 1:7])

## Load a build-in data frame containing pathway definition
## it can also be the path of the file
data(pathway, package = "ARTP2")
head(pathway)

## Define the formula of base risk model
formula <- formula(case_control ~ sex + age + bmi + factor(study))

## binary outcome
family <- "binomial"

## Set the options. 
## Accumulate signal from the top 5 SNPs in each gene
## 1e5 replicates of resampling to estimate the p-value
options <- list(inspect.snp.n = 5, nperm = 1e5, 
                maf = .01, HWE.p = 1e-6, 
                gene.R2 = .9, 
                id.str = "unique-pathway-id", 
                out.dir = getwd(), save.setup = FALSE)

## pathway test, can take a while
## data contains outcome, covariates and genotypes
# ret1 <- rARTP(formula, data = data, pathway, family, options = options)

# ret1$pathway.pvalue 
## [1] 0.03218968 # Mac OS
## [1] 0.03455965 # Linux with 32 threads
## [1] 0.02188978 # Linux with 1 thread

## Mac OS
# head(ret1$gene.pvalue)
##     Gene Chr N.SNP      Pvalue
## 1  USP30  12    18 0.001319987
## 2  DCAF7  17     9 0.071644284
## 3   CANX   5    13 0.266337337
## 4  SOX12  20    15 0.349406506
## 5 CDKN2C   1     6 0.358031420
## 6   FEN1  11     4 0.415345847

# table(ret1$deleted.snps$reason)
# head(ret1$deleted.genes)


##################################################
## Another way to use this function
## Load a vector 'geno' containing file names of genotype
data(geno, package = 'ARTP2')

## Set the paths of genotype files
## in this example, each file contains SNPs in a gene
geno.files <- system.file("extdata", package = "ARTP2", geno)

## data contains outcome, covariates
## Genotypes are instead included in files specified in geno.files
## geno.files are plain text files (or .gz file), which can be read by read.table
# ret2 <- rARTP(formula, data = data[, 2:6], pathway, family, geno.files, 
#               options = options)
# ret2$pathway.pvalue == ret1$pathway.pvalue


##################################################
## The third way
## Genotypes are instead stored as binary PLINK files (bed, bim, and fam)
bed <- system.file("extdata", package = "ARTP2", "raw.bed")
bim <- system.file("extdata", package = "ARTP2", "raw.bim")
fam <- system.file("extdata", package = "ARTP2", "raw.fam")
geno.files <- data.frame(fam, bim, bed, stringsAsFactors = FALSE)

## a column SUBID must be included in data, in this example, first column is SUBID
# ret3 <- rARTP(formula, data = data[, 1:6], pathway, family, geno.files, 
#               options = options)
# ret3$pathway.pvalue == ret1$pathway.pvalue


Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(ARTP2)
Loading required package: Formula
Loading required package: data.table
Loading required package: parallel
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/ARTP2/rARTP.Rd_%03d_medium.png", width=480, height=480)
> ### Name: rARTP
> ### Title: ARTP test for raw data
> ### Aliases: rARTP
> 
> ### ** Examples
> 
> 
> library(ARTP2)
> 
> ## Load the sample data
> data(data, package = "ARTP2")
> head(data[, 1:7])
   SUBID case_control sex    age      bmi study rs11578799
1 SUBID1            1   M 41to50 18.81663     2          2
2 SUBID2            1   F 30to40 16.46630     1          2
3 SUBID3            1   M   gt50 31.44187     1          1
4 SUBID4            1   F 41to50 16.95436     2          2
5 SUBID5            1   F 41to50 37.71600     4          1
6 SUBID6            1   M   gt50 18.22367     5          2
> 
> ## Load a build-in data frame containing pathway definition
> ## it can also be the path of the file
> data(pathway, package = "ARTP2")
> head(pathway)
         SNP Gene Chr
1 rs10803146 AKT3   1
2 rs10803152 AKT3   1
3 rs10927025 AKT3   1
4 rs10927028 AKT3   1
5 rs10927029 AKT3   1
6 rs10927035 AKT3   1
> 
> ## Define the formula of base risk model
> formula <- formula(case_control ~ sex + age + bmi + factor(study))
> 
> ## binary outcome
> family <- "binomial"
> 
> ## Set the options. 
> ## Accumulate signal from the top 5 SNPs in each gene
> ## 1e5 replicates of resampling to estimate the p-value
> options <- list(inspect.snp.n = 5, nperm = 1e5, 
+                 maf = .01, HWE.p = 1e-6, 
+                 gene.R2 = .9, 
+                 id.str = "unique-pathway-id", 
+                 out.dir = getwd(), save.setup = FALSE)
> 
> ## pathway test, can take a while
> ## data contains outcome, covariates and genotypes
> # ret1 <- rARTP(formula, data = data, pathway, family, options = options)
> 
> # ret1$pathway.pvalue 
> ## [1] 0.03218968 # Mac OS
> ## [1] 0.03455965 # Linux with 32 threads
> ## [1] 0.02188978 # Linux with 1 thread
> 
> ## Mac OS
> # head(ret1$gene.pvalue)
> ##     Gene Chr N.SNP      Pvalue
> ## 1  USP30  12    18 0.001319987
> ## 2  DCAF7  17     9 0.071644284
> ## 3   CANX   5    13 0.266337337
> ## 4  SOX12  20    15 0.349406506
> ## 5 CDKN2C   1     6 0.358031420
> ## 6   FEN1  11     4 0.415345847
> 
> # table(ret1$deleted.snps$reason)
> # head(ret1$deleted.genes)
> 
> 
> ##################################################
> ## Another way to use this function
> ## Load a vector 'geno' containing file names of genotype
> data(geno, package = 'ARTP2')
> 
> ## Set the paths of genotype files
> ## in this example, each file contains SNPs in a gene
> geno.files <- system.file("extdata", package = "ARTP2", geno)
> 
> ## data contains outcome, covariates
> ## Genotypes are instead included in files specified in geno.files
> ## geno.files are plain text files (or .gz file), which can be read by read.table
> # ret2 <- rARTP(formula, data = data[, 2:6], pathway, family, geno.files, 
> #               options = options)
> # ret2$pathway.pvalue == ret1$pathway.pvalue
> 
> 
> ##################################################
> ## The third way
> ## Genotypes are instead stored as binary PLINK files (bed, bim, and fam)
> bed <- system.file("extdata", package = "ARTP2", "raw.bed")
> bim <- system.file("extdata", package = "ARTP2", "raw.bim")
> fam <- system.file("extdata", package = "ARTP2", "raw.fam")
> geno.files <- data.frame(fam, bim, bed, stringsAsFactors = FALSE)
> 
> ## a column SUBID must be included in data, in this example, first column is SUBID
> # ret3 <- rARTP(formula, data = data[, 1:6], pathway, family, geno.files, 
> #               options = options)
> # ret3$pathway.pvalue == ret1$pathway.pvalue
> 
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>