Last data update: 2014.03.03

R: SNPlocs objects
SNPlocs-classR Documentation

SNPlocs objects

Description

The SNPlocs class is a container for storing known SNP locations for a given organism. SNPlocs objects are usually made in advance by a volunteer and made available to the Bioconductor community as "SNPlocs data packages". See ?available.SNPs for how to get the list of "SNPlocs data packages" curently available.

This man page's main focus is on how to extract information from a SNPlocs object.

Usage

snpcount(x)

snpsBySeqname(x, seqnames, ...)
## S4 method for signature 'SNPlocs'
snpsBySeqname(x, seqnames, drop.rs.prefix=FALSE)

snpsByOverlaps(x, ranges, maxgap=0L, minoverlap=0L,
               type=c("any", "start", "end", "within", "equal"), ...)
## S4 method for signature 'SNPlocs'
snpsByOverlaps(x, ranges, maxgap=0L, minoverlap=0L,
               type=c("any", "start", "end", "within", "equal"),
               drop.rs.prefix=FALSE, ...)

snpsById(x, ids, ...)
## S4 method for signature 'SNPlocs'
snpsById(x, ids, ifnotfound=c("error", "warning", "drop"))

## Old API
## ------------------------------------

snplocs(x, seqname, ...)
## S4 method for signature 'SNPlocs'
snplocs(x, seqname, as.GRanges=FALSE, caching=TRUE)

snpid2loc(x, snpid, ...)
## S4 method for signature 'SNPlocs'
snpid2loc(x, snpid, caching=TRUE)

snpid2alleles(x, snpid, ...)
## S4 method for signature 'SNPlocs'
snpid2alleles(x, snpid, caching=TRUE)

snpid2grange(x, snpid, ...)
## S4 method for signature 'SNPlocs'
snpid2grange(x, snpid, caching=TRUE)

Arguments

x

A SNPlocs object.

seqnames

The names of the sequences for which to get SNPs. Must be a subset of seqlevels(x). NAs and duplicates are not allowed.

...

Additional arguments, for use in specific methods.

Arguments passed to the snpsByOverlaps method for SNPlocs objects thru ... are passed to internal call to subsetByOverlaps().

drop.rs.prefix

Should the rs prefix be dropped from the returned RefSNP ids? (RefSNP ids are stored in the RefSNP_id metadata column of the returned object.)

ranges

One or more regions of interest specified as a GRanges object. A single region of interest can be specified as a character string of the form "ch14:5201-5300".

maxgap, minoverlap, type

These arguments are passed to subsetByOverlaps() which is used internally by snpsByOverlaps. See ?IRanges::subsetByOverlaps in the IRanges package and ?GenomicRanges::subsetByOverlaps in the GenomicRanges package for more information about the subsetByOverlaps() generic and its method for GenomicRanges objects.

ids, snpid

The RefSNP ids to look up (a.k.a. rs ids). Can be integer or character vector, with or without the "rs" prefix. NAs are not allowed.

ifnotfound

What to do if SNP ids are not found.

seqname

The name of the sequence for which to get the SNP locations and alleles.

If as.GRanges is FALSE, only one sequence can be specified (i.e. seqname must be a single string). If as.GRanges is TRUE, an arbitrary number of sequences can be specified (i.e. seqname can be a character vector of arbitrary length).

as.GRanges

TRUE or FALSE. If TRUE, then the SNP locations and alleles are returned in a GRanges object. Otherwise (the default), they are returned in a data frame.

caching

Should the loaded SNPs be cached in memory for faster further retrieval but at the cost of increased memory usage?

Value

snpcount returns a named integer vector containing the number of SNPs for each sequence in the reference genome.

snpsBySeqname, snpsByOverlaps, and snpsById return a GPos object with 1 element (genomic position) per SNP and the following metadata columns:

  • RefSNP_id: RefSNP ID (aka "rs id"). Character vector with no NAs and no duplicates.

  • alleles_as_ambig: A character vector with no NAs containing the alleles for each SNP represented by an IUPAC nucleotide ambiguity code. See ?IUPAC_CODE_MAP in the Biostrings package for more information.

Note that all the elements (genomic positions) in this GRanges object have their strand set to "+".

If ifnotfound="error", the object returned by snpsById is guaranteed to be parallel to ids, that is, the i-th element in the GPos object corresponds to the i-th element in ids.

Old API

Note that snplocs is superseded by snpsBySeqname, and snpid2loc, snpid2alleles, and snpid2grange are superseded by snpsById.

By default (i.e. when as.GRanges=FALSE), snplocs returns a data frame with 1 row per SNP and the following columns:

  1. RefSNP_id: Same as above but with "rs" prefix always removed.

  2. alleles_as_ambig: Same as above.

  3. loc: The 1-based location of the SNP relative to the first base at the 5' end of the plus strand of the reference sequence.

Otherwise (i.e. when as.GRanges=TRUE), it returns a GRanges object with metadata columns "RefSNP_id" and "alleles_as_ambig".

snpid2loc and snpid2alleles both return a named vector (integer vector for the former, character vector for the latter) where each (name, value) pair corresponds to a supplied SNP id. For both functions the name in (name, value) is the chromosome of the SNP id. The value in (name, value) is the position of the SNP id on the chromosome for snpid2loc, and a single IUPAC code representing the associated alleles for snpid2alleles.

snpid2grange returns a GRanges object similar to the one returned by snplocs (when used with as.GRanges=TRUE) and where each element corresponds to a supplied SNP id.

Author(s)

H. Pag<c3><83><c2><a8>s

See Also

  • available.SNPs

  • GPos and GRanges objects in the GenomicRanges package.

  • injectSNPs

  • IUPAC_CODE_MAP in the Biostrings package.

Examples

library(SNPlocs.Hsapiens.dbSNP141.GRCh38)
snps <- SNPlocs.Hsapiens.dbSNP141.GRCh38
snpcount(snps)

## ---------------------------------------------------------------------
## snpsBySeqname()
## ---------------------------------------------------------------------
## Get all SNPs located on chromosome 22 and MT:
snpsBySeqname(snps, c("ch22", "chMT"))

## ---------------------------------------------------------------------
## snpsByOverlaps()
## ---------------------------------------------------------------------
## Get all SNPs overlapping some regions of interest:
snpsByOverlaps(snps, "ch22:33.63e6-33.64e6")

## With the regions of interest being all the known CDS for hg38
## located on chr22 or chrMT (except for the chromosome naming
## convention, hg38 is the same as GRCh38):
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene
my_cds <- cds(txdb)
seqlevels(my_cds, force=TRUE) <- c("chr22", "chrMT")
seqlevelsStyle(my_cds)  # UCSC
seqlevelsStyle(snps)  # dbSNP
seqlevelsStyle(my_cds) <- seqlevelsStyle(snps)
genome(my_cds) <- genome(snps)
snpsByOverlaps(snps, my_cds)

## ---------------------------------------------------------------------
## snpsById()
## ---------------------------------------------------------------------
## Lookup some RefSNP ids:
my_rsids <- c("rs10458597", "rs12565286", "rs7553394")
## Not run: 
  snpsById(snps, my_rsids)  # error, rs7553394 not found

## End(Not run)
snpsById(snps, my_rsids, ifnotfound="drop")

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(BSgenome)
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: XVector
Loading required package: rtracklayer
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/BSgenome/SNPlocs-class.Rd_%03d_medium.png", width=480, height=480)
> ### Name: SNPlocs-class
> ### Title: SNPlocs objects
> ### Aliases: class:SNPlocs SNPlocs-class SNPlocs provider,SNPlocs-method
> ###   providerVersion,SNPlocs-method releaseDate,SNPlocs-method
> ###   releaseName,SNPlocs-method referenceGenome
> ###   referenceGenome,SNPlocs-method compatibleGenomes
> ###   compatibleGenomes,SNPlocs-method organism,SNPlocs-method
> ###   commonName,SNPlocs-method species,SNPlocs-method
> ###   seqinfo,SNPlocs-method seqnames,SNPlocs-method newSNPlocs
> ###   show,SNPlocs-method snpcount snpcount,SNPlocs-method snpsBySeqname
> ###   snpsBySeqname,SNPlocs-method snpsByOverlaps
> ###   snpsByOverlaps,SNPlocs-method snpsById snpsById,SNPlocs-method
> ###   snplocs snplocs,SNPlocs-method snpid2loc snpid2loc,SNPlocs-method
> ###   snpid2alleles snpid2alleles,SNPlocs-method snpid2grange
> ###   snpid2grange,SNPlocs-method
> ### Keywords: methods classes
> 
> ### ** Examples
> 
> library(SNPlocs.Hsapiens.dbSNP141.GRCh38)
> snps <- SNPlocs.Hsapiens.dbSNP141.GRCh38
> snpcount(snps)
    ch1     ch2     ch3     ch4     ch5     ch6     ch7     ch8     ch9    ch10 
4160510 4455172 3700693 3630068 3337728 3244071 3038017 2887273 2340705 2561394 
   ch11    ch12    ch13    ch14    ch15    ch16    ch17    ch18    ch19    ch20 
2613035 2496276 1801222 1690797 1553541 1750501 1514837 1434166 1234897 1243589 
   ch21    ch22     chX     chY    chMT 
 755647  746745 1938976  118943    2053 
> 
> ## ---------------------------------------------------------------------
> ## snpsBySeqname()
> ## ---------------------------------------------------------------------
> ## Get all SNPs located on chromosome 22 and MT:
> snpsBySeqname(snps, c("ch22", "chMT"))
GPos object with 748798 positions and 2 metadata columns:
           seqnames       pos strand |   RefSNP_id alleles_as_ambig
              <Rle> <integer>  <Rle> | <character>      <character>
       [1]     ch22  10510770      + |  rs71207739                M
       [2]     ch22  10510928      + |  rs71207740                R
       [3]     ch22  10511116      + |   rs4022986                R
       [4]     ch22  10511641      + |  rs71207745                R
       [5]     ch22  10516682      + |   rs4022528                S
       ...      ...       ...    ... .         ...              ...
  [748794]     chMT     16512      + | rs373943637                Y
  [748795]     chMT     16519      + |   rs3937033                Y
  [748796]     chMT     16528      + | rs386829315                R
  [748797]     chMT     16529      + | rs370705831                Y
  [748798]     chMT     16529      + | rs386829316                Y
  -------
  seqinfo: 25 sequences (1 circular) from GRCh38 genome
> 
> ## ---------------------------------------------------------------------
> ## snpsByOverlaps()
> ## ---------------------------------------------------------------------
> ## Get all SNPs overlapping some regions of interest:
> snpsByOverlaps(snps, "ch22:33.63e6-33.64e6")
GPos object with 192 positions and 2 metadata columns:
        seqnames       pos strand |   RefSNP_id alleles_as_ambig
           <Rle> <integer>  <Rle> | <character>      <character>
    [1]     ch22  33630056      + |   rs7287473                R
    [2]     ch22  33630098      + | rs111817452                Y
    [3]     ch22  33630112      + | rs369494860                W
    [4]     ch22  33630115      + | rs373570932                Y
    [5]     ch22  33630156      + |   rs5749641                R
    ...      ...       ...    ... .         ...              ...
  [188]     ch22  33639815      + | rs118054339                R
  [189]     ch22  33639849      + | rs189854722                Y
  [190]     ch22  33639868      + |  rs73400715                Y
  [191]     ch22  33639907      + | rs202183404                R
  [192]     ch22  33639992      + | rs182698410                Y
  -------
  seqinfo: 25 sequences (1 circular) from GRCh38 genome
> 
> ## With the regions of interest being all the known CDS for hg38
> ## located on chr22 or chrMT (except for the chromosome naming
> ## convention, hg38 is the same as GRCh38):
> library(TxDb.Hsapiens.UCSC.hg38.knownGene)
Loading required package: GenomicFeatures
Loading required package: AnnotationDbi
Loading required package: Biobase
Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

> txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene
> my_cds <- cds(txdb)
> seqlevels(my_cds, force=TRUE) <- c("chr22", "chrMT")
> seqlevelsStyle(my_cds)  # UCSC
[1] "UCSC"
> seqlevelsStyle(snps)  # dbSNP
[1] "dbSNP"
> seqlevelsStyle(my_cds) <- seqlevelsStyle(snps)
> genome(my_cds) <- genome(snps)
> snpsByOverlaps(snps, my_cds)
GPos object with 19366 positions and 2 metadata columns:
          seqnames       pos strand |   RefSNP_id alleles_as_ambig
             <Rle> <integer>  <Rle> | <character>      <character>
      [1]     ch22  15528179      + | rs200562384                K
      [2]     ch22  15528255      + |  rs10222248                W
      [3]     ch22  15528264      + | rs377656609                Y
      [4]     ch22  15528298      + | rs201416863                Y
      [5]     ch22  15528300      + | rs374296984                R
      ...      ...       ...    ... .         ...              ...
  [19362]     ch22  50744975      + |   rs1064738                S
  [19363]     ch22  50744993      + | rs184517959                Y
  [19364]     ch22  50745137      + | rs200978349                M
  [19365]     ch22  50745144      + | rs201494682                Y
  [19366]     ch22  50745148      + | rs201653264                R
  -------
  seqinfo: 25 sequences (1 circular) from GRCh38 genome
> 
> ## ---------------------------------------------------------------------
> ## snpsById()
> ## ---------------------------------------------------------------------
> ## Lookup some RefSNP ids:
> my_rsids <- c("rs10458597", "rs12565286", "rs7553394")
> ## Not run: 
> ##D   snpsById(snps, my_rsids)  # error, rs7553394 not found
> ## End(Not run)
> snpsById(snps, my_rsids, ifnotfound="drop")
GPos object with 2 positions and 2 metadata columns:
      seqnames       pos strand |   RefSNP_id alleles_as_ambig
         <Rle> <integer>  <Rle> | <character>      <character>
  [1]      ch1    629241      + |  rs10458597                Y
  [2]      ch1    785910      + |  rs12565286                S
  -------
  seqinfo: 25 sequences (1 circular) from GRCh38 genome
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>