The SNPlocs class is a container for storing known SNP locations for a
given organism. SNPlocs objects are usually made in advance by
a volunteer and made available to the Bioconductor community as
"SNPlocs data packages".
See ?available.SNPs for how to get the list of
"SNPlocs data packages" curently available.
This man page's main focus is on how to extract information from a
SNPlocs object.
The names of the sequences for which to get SNPs. Must be a subset of
seqlevels(x). NAs and duplicates are not allowed.
...
Additional arguments, for use in specific methods.
Arguments passed to the snpsByOverlaps method for SNPlocs
objects thru ... are passed to internal call to
subsetByOverlaps().
drop.rs.prefix
Should the rs prefix be dropped from the returned RefSNP ids?
(RefSNP ids are stored in the RefSNP_id metadata column of the
returned object.)
ranges
One or more regions of interest specified as a
GRanges object. A single region of interest can
be specified as a character string of the form "ch14:5201-5300".
maxgap, minoverlap, type
These arguments are passed to subsetByOverlaps()
which is used internally by snpsByOverlaps.
See ?IRanges::subsetByOverlaps in the IRanges
package and ?GenomicRanges::subsetByOverlaps
in the GenomicRanges package for more information about the
subsetByOverlaps() generic and its method for
GenomicRanges objects.
ids, snpid
The RefSNP ids to look up (a.k.a. rs ids). Can be integer or character
vector, with or without the "rs" prefix. NAs are not allowed.
ifnotfound
What to do if SNP ids are not found.
seqname
The name of the sequence for which to get the SNP locations
and alleles.
If as.GRanges is FALSE, only one sequence can
be specified (i.e. seqname must be a single string).
If as.GRanges is TRUE, an arbitrary number of
sequences can be specified (i.e. seqname can be
a character vector of arbitrary length).
as.GRanges
TRUE or FALSE. If TRUE, then the SNP locations
and alleles are returned in a GRanges object.
Otherwise (the default), they are returned in a data frame.
caching
Should the loaded SNPs be cached in memory for faster further
retrieval but at the cost of increased memory usage?
Value
snpcount returns a named integer vector containing the number
of SNPs for each sequence in the reference genome.
snpsBySeqname, snpsByOverlaps, and snpsById return
a GPos object with 1 element (genomic position)
per SNP and the following metadata columns:
RefSNP_id: RefSNP ID (aka "rs id"). Character vector
with no NAs and no duplicates.
alleles_as_ambig: A character vector with no NAs
containing the alleles for each SNP represented by an IUPAC
nucleotide ambiguity code.
See ?IUPAC_CODE_MAP in the
Biostrings package for more information.
Note that all the elements (genomic positions) in this
GRanges object have their strand set to "+".
If ifnotfound="error", the object returned by snpsById
is guaranteed to be parallel to ids, that is, the i-th
element in the GPos object corresponds to the
i-th element in ids.
Old API
Note that snplocs is superseded by snpsBySeqname, and
snpid2loc, snpid2alleles, and snpid2grange are
superseded by snpsById.
By default (i.e. when as.GRanges=FALSE), snplocs returns a
data frame with 1 row per SNP and the following columns:
RefSNP_id: Same as above but with "rs" prefix
always removed.
alleles_as_ambig: Same as above.
loc: The 1-based location of the SNP relative to the
first base at the 5' end of the plus strand of the reference
sequence.
Otherwise (i.e. when as.GRanges=TRUE), it returns a
GRanges object with metadata columns
"RefSNP_id" and "alleles_as_ambig".
snpid2loc and snpid2alleles both return a named vector
(integer vector for the former, character vector for the latter)
where each (name, value) pair corresponds to a supplied SNP id.
For both functions the name in (name, value) is the chromosome
of the SNP id. The value in (name, value) is the position of the
SNP id on the chromosome for snpid2loc, and a single IUPAC
code representing the associated alleles for snpid2alleles.
snpid2grange returns a GRanges object
similar to the one returned by snplocs (when used with
as.GRanges=TRUE) and where each element corresponds to a
supplied SNP id.
Author(s)
H. Pag<c3><83><c2><a8>s
See Also
available.SNPs
GPos and GRanges
objects in the GenomicRanges package.
injectSNPs
IUPAC_CODE_MAP in the Biostrings
package.
Examples
library(SNPlocs.Hsapiens.dbSNP141.GRCh38)
snps <- SNPlocs.Hsapiens.dbSNP141.GRCh38
snpcount(snps)
## ---------------------------------------------------------------------
## snpsBySeqname()
## ---------------------------------------------------------------------
## Get all SNPs located on chromosome 22 and MT:
snpsBySeqname(snps, c("ch22", "chMT"))
## ---------------------------------------------------------------------
## snpsByOverlaps()
## ---------------------------------------------------------------------
## Get all SNPs overlapping some regions of interest:
snpsByOverlaps(snps, "ch22:33.63e6-33.64e6")
## With the regions of interest being all the known CDS for hg38
## located on chr22 or chrMT (except for the chromosome naming
## convention, hg38 is the same as GRCh38):
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene
my_cds <- cds(txdb)
seqlevels(my_cds, force=TRUE) <- c("chr22", "chrMT")
seqlevelsStyle(my_cds) # UCSC
seqlevelsStyle(snps) # dbSNP
seqlevelsStyle(my_cds) <- seqlevelsStyle(snps)
genome(my_cds) <- genome(snps)
snpsByOverlaps(snps, my_cds)
## ---------------------------------------------------------------------
## snpsById()
## ---------------------------------------------------------------------
## Lookup some RefSNP ids:
my_rsids <- c("rs10458597", "rs12565286", "rs7553394")
## Not run:
snpsById(snps, my_rsids) # error, rs7553394 not found
## End(Not run)
snpsById(snps, my_rsids, ifnotfound="drop")
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(BSgenome)
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: XVector
Loading required package: rtracklayer
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/BSgenome/SNPlocs-class.Rd_%03d_medium.png", width=480, height=480)
> ### Name: SNPlocs-class
> ### Title: SNPlocs objects
> ### Aliases: class:SNPlocs SNPlocs-class SNPlocs provider,SNPlocs-method
> ### providerVersion,SNPlocs-method releaseDate,SNPlocs-method
> ### releaseName,SNPlocs-method referenceGenome
> ### referenceGenome,SNPlocs-method compatibleGenomes
> ### compatibleGenomes,SNPlocs-method organism,SNPlocs-method
> ### commonName,SNPlocs-method species,SNPlocs-method
> ### seqinfo,SNPlocs-method seqnames,SNPlocs-method newSNPlocs
> ### show,SNPlocs-method snpcount snpcount,SNPlocs-method snpsBySeqname
> ### snpsBySeqname,SNPlocs-method snpsByOverlaps
> ### snpsByOverlaps,SNPlocs-method snpsById snpsById,SNPlocs-method
> ### snplocs snplocs,SNPlocs-method snpid2loc snpid2loc,SNPlocs-method
> ### snpid2alleles snpid2alleles,SNPlocs-method snpid2grange
> ### snpid2grange,SNPlocs-method
> ### Keywords: methods classes
>
> ### ** Examples
>
> library(SNPlocs.Hsapiens.dbSNP141.GRCh38)
> snps <- SNPlocs.Hsapiens.dbSNP141.GRCh38
> snpcount(snps)
ch1 ch2 ch3 ch4 ch5 ch6 ch7 ch8 ch9 ch10
4160510 4455172 3700693 3630068 3337728 3244071 3038017 2887273 2340705 2561394
ch11 ch12 ch13 ch14 ch15 ch16 ch17 ch18 ch19 ch20
2613035 2496276 1801222 1690797 1553541 1750501 1514837 1434166 1234897 1243589
ch21 ch22 chX chY chMT
755647 746745 1938976 118943 2053
>
> ## ---------------------------------------------------------------------
> ## snpsBySeqname()
> ## ---------------------------------------------------------------------
> ## Get all SNPs located on chromosome 22 and MT:
> snpsBySeqname(snps, c("ch22", "chMT"))
GPos object with 748798 positions and 2 metadata columns:
seqnames pos strand | RefSNP_id alleles_as_ambig
<Rle> <integer> <Rle> | <character> <character>
[1] ch22 10510770 + | rs71207739 M
[2] ch22 10510928 + | rs71207740 R
[3] ch22 10511116 + | rs4022986 R
[4] ch22 10511641 + | rs71207745 R
[5] ch22 10516682 + | rs4022528 S
... ... ... ... . ... ...
[748794] chMT 16512 + | rs373943637 Y
[748795] chMT 16519 + | rs3937033 Y
[748796] chMT 16528 + | rs386829315 R
[748797] chMT 16529 + | rs370705831 Y
[748798] chMT 16529 + | rs386829316 Y
-------
seqinfo: 25 sequences (1 circular) from GRCh38 genome
>
> ## ---------------------------------------------------------------------
> ## snpsByOverlaps()
> ## ---------------------------------------------------------------------
> ## Get all SNPs overlapping some regions of interest:
> snpsByOverlaps(snps, "ch22:33.63e6-33.64e6")
GPos object with 192 positions and 2 metadata columns:
seqnames pos strand | RefSNP_id alleles_as_ambig
<Rle> <integer> <Rle> | <character> <character>
[1] ch22 33630056 + | rs7287473 R
[2] ch22 33630098 + | rs111817452 Y
[3] ch22 33630112 + | rs369494860 W
[4] ch22 33630115 + | rs373570932 Y
[5] ch22 33630156 + | rs5749641 R
... ... ... ... . ... ...
[188] ch22 33639815 + | rs118054339 R
[189] ch22 33639849 + | rs189854722 Y
[190] ch22 33639868 + | rs73400715 Y
[191] ch22 33639907 + | rs202183404 R
[192] ch22 33639992 + | rs182698410 Y
-------
seqinfo: 25 sequences (1 circular) from GRCh38 genome
>
> ## With the regions of interest being all the known CDS for hg38
> ## located on chr22 or chrMT (except for the chromosome naming
> ## convention, hg38 is the same as GRCh38):
> library(TxDb.Hsapiens.UCSC.hg38.knownGene)
Loading required package: GenomicFeatures
Loading required package: AnnotationDbi
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
> txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene
> my_cds <- cds(txdb)
> seqlevels(my_cds, force=TRUE) <- c("chr22", "chrMT")
> seqlevelsStyle(my_cds) # UCSC
[1] "UCSC"
> seqlevelsStyle(snps) # dbSNP
[1] "dbSNP"
> seqlevelsStyle(my_cds) <- seqlevelsStyle(snps)
> genome(my_cds) <- genome(snps)
> snpsByOverlaps(snps, my_cds)
GPos object with 19366 positions and 2 metadata columns:
seqnames pos strand | RefSNP_id alleles_as_ambig
<Rle> <integer> <Rle> | <character> <character>
[1] ch22 15528179 + | rs200562384 K
[2] ch22 15528255 + | rs10222248 W
[3] ch22 15528264 + | rs377656609 Y
[4] ch22 15528298 + | rs201416863 Y
[5] ch22 15528300 + | rs374296984 R
... ... ... ... . ... ...
[19362] ch22 50744975 + | rs1064738 S
[19363] ch22 50744993 + | rs184517959 Y
[19364] ch22 50745137 + | rs200978349 M
[19365] ch22 50745144 + | rs201494682 Y
[19366] ch22 50745148 + | rs201653264 R
-------
seqinfo: 25 sequences (1 circular) from GRCh38 genome
>
> ## ---------------------------------------------------------------------
> ## snpsById()
> ## ---------------------------------------------------------------------
> ## Lookup some RefSNP ids:
> my_rsids <- c("rs10458597", "rs12565286", "rs7553394")
> ## Not run:
> ##D snpsById(snps, my_rsids) # error, rs7553394 not found
> ## End(Not run)
> snpsById(snps, my_rsids, ifnotfound="drop")
GPos object with 2 positions and 2 metadata columns:
seqnames pos strand | RefSNP_id alleles_as_ambig
<Rle> <integer> <Rle> | <character> <character>
[1] ch1 629241 + | rs10458597 Y
[2] ch1 785910 + | rs12565286 S
-------
seqinfo: 25 sequences (1 circular) from GRCh38 genome
>
>
>
>
>
> dev.off()
null device
1
>