Last data update: 2014.03.03

R: TRait-Associated SNP EnRichment analyses
traseRR Documentation

TRait-Associated SNP EnRichment analyses

Description

Perform GWAS trait-associated SNP enrichment analyses in genomic intervals using different approaches

Usage

traseR(snpdb, region, snpdb.bg=NULL, keyword = NULL, rankby = c("pvalue", "odds.ratio"), 
test.method = c("binomial", "fisher","chisq", "nonparametric"), alternative = c("greater", "less", "two.sided"), 
ntimes=100,nbatch=1,
trait.threshold = 0, traitclass.threshold=0, pvalue = 1e-3)

Arguments

snpdb

A GRange object. It could be GWAS trait-associated SNPs downloaded from up-to-date dbGaP and NHGRI public database. It is maintained to be updated to the latest version. The data frame contains the following columns,Source,Trait,SNP,p.value,Chr,Position,Context,GENE_NAME,GENE_START,GENE_END,GENE_STRAND. The data frame is in data subdirectory. Users are free to add more SNP records to the data frame for practical use. It could also be a data frame with columns as, SNP,Chr,Position.

region

A GRange object or data frame, which is genomic intervals with three columns, chromosome, genomic start position, genomic end position.

snpdb.bg

A GRange object contains non-trait-associated SNPs. They are treated as background for statistical testing instead of whole genome as background if specified.

keyword

The keyword is used when specific trait is of interest. If keyword is specified, only the SNPs associated to the trait are used for analyses. Otherwise, all traits will be analyzed.

rankby

Traits could be ranked by either p-value or adds.ratio based on the enrichment level of trait-associated SNPs in genomic intervals.

test.method

Several hypothesis testing options are provided: binomial(binomial test),fisher(Fisher's exact test),chisq(Chi-squared test),chisq(nonparametric test). Default is binomial(binomial test)

alternative

Indicate the alternative hypothesis. If greater, test if the genomic intervals are enriched in trait-associated SNPs than background. If less, test if the genomic intervals are depleted in trait-associated SNPs than background. If two.sided, test if there is difference between the enrichment of trait-associated SNPs in genomic intervals and in background.

ntimes

The number of shuffling time for one batch. See nbatch.

nbatch

The number of batches. The product of ntimes and nbatch is the total number of shuffling time.

trait.threshold

Test traits with number of SNPs more than the threshold.

traitclass.threshold

Test trait class with number of SNPs more than the threshold.

pvalue

SNPs with p-value less than this threshold are used for analyses.

Details

Return a list that contains three data frames. One data frame tb.all contains the results of enrichment analyses for all trait-associated SNPs in genomic intervals. Another data frame tb1 contains the results of enrichment analyses for each trait-associated SNPs in genomic intervals separately. Another data frame tb2 contains the results of enrichment analyses for each trait-class-associated SNPs in genomic intervals separately.

Value

The data frame tb1 has columns,

Trait

Name of trait

p.value

P-value calculated from hypothesis testing

q.value

Adjusted p-value from multiple testing using FDR correction

odds.ratio

Odds ratio calculated based on number of trait-associated SNPs in genomic intervals, number of trait-associated SNPs across whole genome, genomic intervals size (bps) and genome size (bps)

taSNP.hits

Number of trait-associated SNPs in genomic intervals

taSNP.num

Number of SNPs for specific trait

Author(s)

Li Chen <li.chen@emory.edu>, Zhaohui S.Qin<zhaohui.qin@emory.edu>

See Also

print.traseR

Examples

	data(taSNP)
	data(Tcell)
	x=traseR(snpdb=taSNP,region=Tcell)
	print(x)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(traseR)
Loading required package: GenomicRanges
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: BSgenome.Hsapiens.UCSC.hg19
Loading required package: BSgenome
Loading required package: Biostrings
Loading required package: XVector
Loading required package: rtracklayer
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/traseR/traseR.Rd_%03d_medium.png", width=480, height=480)
> ### Name: traseR
> ### Title: TRait-Associated SNP EnRichment analyses
> ### Aliases: traseR
> 
> ### ** Examples
> 
> 	data(taSNP)
> 	data(Tcell)
> 	x=traseR(snpdb=taSNP,region=Tcell)
There are  128094211 bp in the query region, accounting for  0.0421875469310327  of the genome.
There are  573 traits in the analysis.
There are  33 trait class in the analysis.
100 traits have been tested!
200 traits have been tested!
300 traits have been tested!
400 traits have been tested!
500 traits have been tested!
10 trait class have been tested!
20 trait class have been tested!
30 trait class have been tested!
> 	print(x)
There are 573 traits in the test.
The overall functional SNP enrichment test results are:
  Trait       p.value odds.ratio taSNP.hits taSNP.num
1   All 3.788373e-233   2.134717       2625     30553
The trait-associated SNP enrichment test results are:
                            Trait      p.value      q.value odds.ratio
67                Behcet Syndrome 4.400406e-23 2.521433e-20   6.306579
172     Diabetes Mellitus, Type 1 1.704981e-11 4.884769e-09   5.045263
340 Lupus Erythematosus, Systemic 6.159346e-09 1.176435e-06   3.902195
49          Arthritis, Rheumatoid 1.442123e-07 2.065841e-05   5.126637
379            Multiple Sclerosis 1.644125e-05 1.884167e-03   2.905210
62            Autoimmune Diseases 5.201529e-05 4.967461e-03  15.892575
    taSNP.hits taSNP.num
67          59       274
172         33       185
340         32       223
49          20       112
379         26       236
62           6        15
The trait-class-associated SNP enrichment test results are:
                           Trait_Class      p.value      q.value odds.ratio
17              Immune System Diseases 3.729169e-35 1.143835e-33   3.658860
31 Skin and Connective Tissue Diseases 6.932335e-35 1.143835e-33   3.916319
32             Stomatognathic Diseases 1.041455e-22 1.145601e-21   5.675922
14                        Eye Diseases 3.479491e-18 2.870580e-17   3.313308
11           Digestive System Diseases 4.362324e-14 2.879134e-13   3.040672
7              Cardiovascular Diseases 3.008551e-11 1.654703e-10   1.602762
13           Endocrine System Diseases 6.933337e-09 3.268573e-08   2.068149
24  Nutritional and Metabolic Diseases 4.763509e-08 1.964948e-07   2.068673
21            Musculoskeletal Diseases 3.118359e-05 1.143398e-04   2.716680
23             Nervous System Diseases 5.549981e-05 1.831494e-04   1.495744
16        Hemic and Lymphatic Diseases 1.649261e-04 4.947782e-04   3.596622
22                           Neoplasms 3.372076e-04 9.273210e-04   1.580636
30          Respiratory Tract Diseases 6.507770e-04 1.651972e-03   2.121839
   taSNP.hits taSNP.num
17        155      1122
31        142       970
32         63       318
14         87       689
11         74       633
7         253      3850
13         89      1076
24         79       956
21         27       260
23        122      1988
16         15       115
22         76      1181
30         29       349
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>