Last data update: 2014.03.03

R: data on 1000 genomes SNPs that 'tag' GWAS catalog entries
gwastaggerR Documentation

data on 1000 genomes SNPs that 'tag' GWAS catalog entries

Description

data on 1000 genomes SNPs that 'tag' GWAS catalog entries

Usage

data(gwastagger)

Format

The format is:
Formal class 'GRanges' [package "GenomicRanges"] with 6 slots
..@ seqnames :Formal class 'Rle' [package "IRanges"] with 4 slots
.. .. ..@ values : Factor w/ 24 levels "chr1","chr2",..: 1 2 3 4 5 6 7 8 9 10 ...
.. .. ..@ lengths : int [1:22] 24042 23740 21522 14258 14972 34101 12330 11400 8680 15429 ...
.. .. ..@ elementMetadata: NULL
.. .. ..@ metadata : list()
..@ ranges :Formal class 'IRanges' [package "IRanges"] with 6 slots
.. .. ..@ start : int [1:297579] 986111 988364 992250 992402 995669 999686 1005579 1007450 1011209 1011446 ...
.. .. ..@ width : int [1:297579] 1 1 1 1 1 1 1 1 1 1 ...
.. .. ..@ NAMES : NULL
.. .. ..@ elementType : chr "integer"
.. .. ..@ elementMetadata: NULL
.. .. ..@ metadata : list()
..@ strand :Formal class 'Rle' [package "IRanges"] with 4 slots
.. .. ..@ values : Factor w/ 3 levels "+","-","*": 3
.. .. ..@ lengths : int 297579
.. .. ..@ elementMetadata: NULL
.. .. ..@ metadata : list()
..@ elementMetadata:Formal class 'DataFrame' [package "IRanges"] with 6 slots
.. .. ..@ rownames : NULL
.. .. ..@ nrows : int 297579
.. .. ..@ listData :List of 3
.. .. .. ..$ tagid : chr [1:297579] "rs28479311" "rs3813193" "chr1:992250" "rs60442576" ...
.. .. .. ..$ R2 : num [1:297579] 0.938 0.994 0.969 1 1 ...
.. .. .. ..$ baseid: chr [1:297579] "rs3934834" "rs3934834" "rs3934834" "rs3934834" ...
.. .. ..@ elementType : chr "ANY"
.. .. ..@ elementMetadata: NULL
.. .. ..@ metadata : list()
..@ seqinfo :Formal class 'Seqinfo' [package "GenomicRanges"] with 4 slots
.. .. ..@ seqnames : chr [1:24] "chr1" "chr2" "chr3" "chr4" ...
.. .. ..@ seqlengths : int [1:24] 249250621 243199373 198022430 191154276 180915260 171115067 159138663 146364022 141213431 135534747 ...
.. .. ..@ is_circular: logi [1:24] FALSE FALSE FALSE FALSE FALSE FALSE ...
.. .. ..@ genome : chr [1:24] "hg19" "hg19" "hg19" "hg19" ...
..@ metadata : list()

Details

This GRanges instance includes locations for 297000 1000 genomes SNP that have been identified as exhibiting LD with NHGRI GWAS SNP as of September 2013. The tagid field tells the name of the tagging SNP, the baseid field is the SNP identifier for the GWAS catalog entry, the R2 field tells the value of R-squared relating the distributions of the tagging SNP and the GWAS entry. Only tagging SNP with R-squared 0.8 or greater are included. A self-contained R-based procedure should emerge in 2014.

Source

NHGRI GWAS catalog; plink is used with the 1000 genomes VCF in a perl routine by Michael McGeachie, Harvard Medical School;

Examples

data(gwastagger)
gwastagger[1:5]
data(ebicat37)
mean(ebicat37$SNPS %in% gwastagger$baseid)
# ideally, all GWAS SNP would be in our tagging ranges as baseid
query <- setdiff(ebicat37$SNPS, gwastagger$baseid)
# relatively recent catalog additions
sort(table(ebicat37[which(ebicat37$SNPS %in% query)]$DATE.ADDED.TO.CATALOG), decreasing=TRUE)[1:10]

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(gwascat)
Loading required package: Homo.sapiens
Loading required package: AnnotationDbi
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: IRanges
Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: OrganismDbi
Loading required package: GenomicFeatures
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: GO.db

Loading required package: org.Hs.eg.db

Loading required package: TxDb.Hsapiens.UCSC.hg19.knownGene
gwascat loaded.  Use data(ebicat38) for hg38 coordinates;
 data(ebicat37) for hg19 coordinates.
Warning message:
replacing previous import 'ggplot2::Position' by 'BiocGenerics::Position' when loading 'ggbio' 
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/gwascat/gwastagger.Rd_%03d_medium.png", width=480, height=480)
> ### Name: gwastagger
> ### Title: data on 1000 genomes SNPs that 'tag' GWAS catalog entries
> ### Aliases: gwastagger
> ### Keywords: datasets
> 
> ### ** Examples
> 
> data(gwastagger)
> gwastagger[1:5]
GRanges object with 5 ranges and 3 metadata columns:
      seqnames           ranges strand |       tagid        R2      baseid
         <Rle>        <IRanges>  <Rle> | <character> <numeric> <character>
  [1]     chr1 [986111, 986111]      * |  rs28479311  0.938021   rs3934834
  [2]     chr1 [988364, 988364]      * |   rs3813193  0.993718   rs3934834
  [3]     chr1 [992250, 992250]      * | chr1:992250  0.969160   rs3934834
  [4]     chr1 [992402, 992402]      * |  rs60442576  1.000000   rs3934834
  [5]     chr1 [995669, 995669]      * |   rs3934834  1.000000   rs3934834
  -------
  seqinfo: 24 sequences from 2 genomes (hg19, NA)
> data(ebicat37)
> mean(ebicat37$SNPS %in% gwastagger$baseid)
[1] 0.6136283
> # ideally, all GWAS SNP would be in our tagging ranges as baseid
> query <- setdiff(ebicat37$SNPS, gwastagger$baseid)
> # relatively recent catalog additions
> sort(table(ebicat37[which(ebicat37$SNPS %in% query)]$DATE.ADDED.TO.CATALOG), decreasing=TRUE)[1:10]
 [1] NA NA NA NA NA NA NA NA NA NA
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>