Last data update: 2014.03.03

R: options
optionsR Documentation

options

Description

The list to describe the options that are used in sARTP, rARTP. It will be set by function options.default by default.

Format

The format is a list.

out.dir

output directory for temporary and output files. The default is the working directory getwd.

id.str

character string that is appended to temporary file names. The default is "PID".

seed

integer for random number generation. The default is 1.

Options for testing an association:

method

1 = AdaJoint, 2 = AdaJoint2, 3 = ARTP. The default is 3. It can also be 'AdaJoint', 'AdaJoint2', or 'ARTP'. The package will convert it into upper case, so for example, 'Adajoint' is also accepted. The ARTP method was the proposed in Yu et al. (2009) Genet Epi, while AdaJoint and AdaJoint2 methods were proposed in Zhang et al. (2014) EJHG. Note that AdaJoint2 could be more powerful if (1) two functional SNPs are negative correlated and have effects in the same direction; or (2) two functional SNPs are positively correlated and have opposite directions of their effects.

nperm

the number of permutations. The default is 1E5.

nthread

the number of threads for multi-threaded processors in Unix/Linux OS. The default is detectCores() to use all available processors.

Options for controlling data cleaning:

snp.miss.rate

any SNP with missing rate greater than snp.miss.rate will be removed from the analysis. The default is 0.05.

maf

any SNP with minor allele frequency less than maf will be removed from the analysis. The default is 0.05.

HWE.p

any SNP with HWE exact p-value less than HWE.p will be removed from the analysis. The test is applied to the genotype data or reference data. The test is ignored if the imputed genotype are not encoded as 0/1/2. The default is 1E-5.

gene.R2

a number between 0 and 1 to filter out SNPs that are highly correlated within each gene. The cor function will be called to compute the R^2 values between each pair of SNPs and remove one SNP with lower MAF in each pair with R^2 greater than gene.R2. The default is 0.95.

chr.R2

a number between 0 and 1 to filter out SNPs that are highly correlated within each chromosome. The cor function will be called to compute the R^2 values between each pair of SNPs and remove one SNP with lower MAF in each pair with R^2 greater than chr.R2. The default is 0.95.

gene.miss.rate

threshold to remove genes based on their missing rate. Genes with missing rate greater than gene.miss.rate will be removed from the analysis. The missing rate is calculated as the number of subjects with at least one missing genotype among all SNPs in the gene divided by the total number of subjects. The default is 1.0.

rm.gene.subset

TRUE to remove genes which are subsets of other genes. The default is TRUE.

turn.off.filters

a shortcut to turn off all SNP filters. If TRUE, it is equivalent to set snp.miss.rate = 1, maf = 0, trim.huge.chr, gene.R2 = 1, chr.R2 = 1, huge.gene.R2 = 1, huge.chr.R2 = 1, and HWE.p = 0. The default is FALSE.

group.gap

an integer to regroup SNPs in a chromosome into independent groups. The unit is base-pair (bp). The position information will be collected from the fourth column of bim files. The default is NULL, i.e., regrouping is not performed.

delete

TRUE to delete temporary files containing the test statistics for each gene. The default is TRUE.

print

TRUE to print information to the console. The default is TRUE.

tidy

the data frame deleted.snps in the returned object of sARTP containing information of SNPs excluded from the analysis and their reasons. Possible reason codes include RM_BY_SNP_NAMES, RM_BY_REGIONS, NO_SUM_STAT, NO_RAW_GENO, NO_REF, SNP_MISS_RATE, SNP_LOW_MAF, SNP_CONST, SNP_HWE, GENE_R2, HUGE_GENE_R2, CHR_R2, HUGE_CHR, HUGE_CHR2, HUGE_CHR3, GENE_MISS_RATE, GENE_SUBSET, CONF_ALLELE_INFO. Set tidy as TRUE to hide the SNPs with codes NO_SUM_STAT and NO_REF. The default is TRUE.

save.setup

TRUE to save necessary data, e.g., working options, observed scores and covariance matrix, to local to repeat the analysis more quicly (skip loading and filtering data). It will be set to be TRUE if only.setup is TRUE. The default is TRUE.

path.setup

character string of file name to save the setup for warm.start if save.setup is TRUE. The default is NULL so that it is set as paste(out.dir, "/setup.", id.str, ".rda", sep = "").

only.setup

TRUE if only the setup is needed while the testing procedure is not. The R code to create the setup uses single thread but the testing procedure can be multi-threaded. The best practice to use ARTP2 on a multi-threaded cluster is to firstly create the setup in single-thread mode, and then call the warm.start to compute the p-values in multiple-thread mode, which uses the saved setup at path.setup as input. save.setup will be set to be TRUE if only.setup is TRUE. The default is FALSE.

keep.geno

TRUE if the reference genotypes of SNPs in pathway is returned. The default is FALSE.

excluded.snps

character vector of SNPs to be excluded in the analysis. NULL if no SNP is excluded. The default is NULL.

selected.snps

character vector of SNPs to be selected in the analysis. NULL if all SNPs are selected but other filters may be applied. The default is NULL.

excluded.regions

data frame with three columns Chr, Start, End, or three columns Chr, Pos, Radius. The unit is base-pair (bp). SNPs within [Start, End] or [Pos - Radius, Pos + Radius] will be excluded. See Examples in sARTP. This option is only available for sARTP. The default is NULL.

excluded.subs

character vector of subject IDs to be excluded in the analysis. These IDs must match with those in the second column (Individual ID) of the fam files in reference. The default is NULL.

selected.subs

character vector of subject IDs to be selected in the analysis. These IDs must match with those in the second column (Individual ID) of the fam files in reference. The default is NULL.

excluded.genes

character vector of genes to be excluded in the analysis. NULL if no gene is excluded. The default is NULL.

meta

TRUE if return meta-analysis summary data from sARTP. The default is FALSE.

Options for handling huge pathways:

trim.huge.chr

oversized chromosomes could be further trimmed to accelerate the testing procedure. If TRUE the additional options below are in effect. The default is TRUE.

huge.gene.size

a gene with number of SNPs larger than huge.gene.size will be further trimmed with huge.gene.R2 if trim.huge.chr is TRUE. The default is 1000.

huge.chr.size

a chromosome with number of SNPs larger than huge.chr.size will be further trimmed with huge.chr.R2 if trim.huge.chr is TRUE. The default is 2000.

huge.gene.R2

more stringent R^2 threshold to filter out SNPs in a gene. Similar to gene.R2. The default is gene.R2 - 0.05.

huge.chr.R2

more stringent R^2 threshold to filter out SNPs in a chromosome. Similar to chr.R2. The default is chr.R2 - 0.05.

Options for gene-based test:

inspect.snp.n

the number of candidate truncation points to inspect the top SNPs in a gene. The default is 5. (See Details)

inspect.snp.percent

a value x between 0 and 1 such that a truncation point will be defined at every x percent of the top SNPs. The default is 0 so that the truncation points will be 1:inspect.snp.n. (See Details)

Options for pathway-based test:

inspect.gene.n

the number of candidate truncation points to inspect the top genes in the pathway. The default is 10.

inspect.gene.percent

a value x between 0 and 1 such that a truncation point will be defined at every x percent of the top genes. If 0 then the truncation points will be 1:inspect.gene.n. The default is 0.05.

Details

Order of removing SNPs, genes and subjects:
1. Apply the options excluded.snps and selected.snps if non-NULL. Code: RM_BY_SNP_NAMES.
2. Apply the option excluded.regions if non-NULL and if sARTP is used. Code: RM_BY_REGIONS.
2. Remove SNPs without summary statistics in summary.files. Code: NO_SUM_STAT; or remove SNPs without raw genotype data in data or geno.files. Code: NO_RAW_GENO.
3. Remove SNPs not in bim files in reference if sARTP is used. Code: NO_REF.
4. Remove SNPs with conflictive allele information in summary and reference data if sARTP is used. Code: CONF_ALLELE_INFO.
5. Remove SNPs with high missing rate. Code: SNP_MISS_RATE.
6. Remove SNPs with low MAF. Code: SNP_LOW_MAF.
7. Remove constant SNPs. Code: SNP_CONST.
8. Remove SNPs fail to pass HWE test. Code: SNP_HWE.
9. Remove highly correlated SNPs within each gene. Code: GENE_R2 or HUGE_GENE_R2.
10. Remove highly correlated SNPs within each chromosome. Code: CHR_R2, HUGE_CHR, HUGE_CHR2 or HUGE_CHR3.
11. Remove genes with high missing rate. Code: GENE_MISS_RATE.
12. Remove genes which are subsets of other genes. Code: GENE_SUBSET.

Example truncation points defined by inspect.snp.n and inspect.snp.percent: Assume the number of SNPs in a gene is 100. Below are examples of the truncation points for different values of inspect.snp.n and inspect.snp.percent. Similar values are applied to inspect.gene.n and inspect.gene.percent.

inspect.snp.n inspect.snp.percent truncation points
1 0 1
1 0.05 5
1 0.25 25
1 1 100
2 0 1, 2
2 0.05 5, 10
2 0.25 25, 50
2 1 100
3 0.2 20, 40, 60

See Also

options.default

Examples

options <- options.default()
str(options)
names(options)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(ARTP2)
Loading required package: Formula
Loading required package: data.table
Loading required package: parallel
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/ARTP2/options.Rd_%03d_medium.png", width=480, height=480)
> ### Name: options
> ### Title: options
> ### Aliases: options
> 
> ### ** Examples
> 
> options <- options.default()
> str(options)
List of 40
 $ out.dir             : chr "/home/ddbj/DataUpdator-rgm3/target"
 $ id.str              : chr "PID"
 $ method              : num 3
 $ nperm               : num 1e+05
 $ snp.miss.rate       : num 0.05
 $ maf                 : num 0.05
 $ HWE.p               : num 1e-05
 $ chr.R2              : num 0.95
 $ gene.R2             : num 0.95
 $ gene.miss.rate      : num 1
 $ group.gap           : NULL
 $ rm.gene.subset      : logi TRUE
 $ turn.off.filters    : logi FALSE
 $ delete              : logi TRUE
 $ print               : logi TRUE
 $ tidy                : logi TRUE
 $ save.setup          : logi TRUE
 $ path.setup          : NULL
 $ only.setup          : logi FALSE
 $ keep.geno           : logi FALSE
 $ seed                : num 1
 $ nthread             : int 4
 $ excluded.snps       : NULL
 $ selected.snps       : NULL
 $ excluded.regions    : NULL
 $ excluded.subs       : NULL
 $ selected.subs       : NULL
 $ excluded.genes      : NULL
 $ meta                : logi FALSE
 $ only.meta           : logi TRUE
 $ inspect.snp.n       : num 5
 $ inspect.snp.percent : num 0
 $ inspect.gene.n      : num 10
 $ inspect.gene.percent: num 0.05
 $ trim.huge.chr       : logi TRUE
 $ huge.gene.size      : num 1000
 $ huge.chr.size       : num 2000
 $ huge.gene.R2        : num 0.85
 $ huge.chr.R2         : num 0.85
 $ version             :Classes 'package_version', 'numeric_version'  hidden list of 1
  ..$ : int [1:3] 0 9 22
> names(options)
 [1] "out.dir"              "id.str"               "method"              
 [4] "nperm"                "snp.miss.rate"        "maf"                 
 [7] "HWE.p"                "chr.R2"               "gene.R2"             
[10] "gene.miss.rate"       "group.gap"            "rm.gene.subset"      
[13] "turn.off.filters"     "delete"               "print"               
[16] "tidy"                 "save.setup"           "path.setup"          
[19] "only.setup"           "keep.geno"            "seed"                
[22] "nthread"              "excluded.snps"        "selected.snps"       
[25] "excluded.regions"     "excluded.subs"        "selected.subs"       
[28] "excluded.genes"       "meta"                 "only.meta"           
[31] "inspect.snp.n"        "inspect.snp.percent"  "inspect.gene.n"      
[34] "inspect.gene.percent" "trim.huge.chr"        "huge.gene.size"      
[37] "huge.chr.size"        "huge.gene.R2"         "huge.chr.R2"         
[40] "version"             
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>