The XtraSNPlocs class is a container for storing extra SNP locations and
alleles for a given organism.
While a SNPlocs object can store only molecular variations of class
snp, an XtraSNPlocs object contains molecular variations of other
classes (in-del, heterozygous, microsatellite,
named-locus, no-variation, mixed,
multinucleotide-polymorphism).
XtraSNPlocs objects are usually made in advance by a volunteer and made
available to the Bioconductor community as XtraSNPlocs data packages.
See ?available.SNPs for how to get the list of
SNPlocs and XtraSNPlocs data packages curently available.
This man page's main focus is on how to extract data from an
XtraSNPlocs object.
The names of the sequences for which to get SNPs. NAs and duplicates
are not allowed. The supplied seqnames must be a subset of
seqlevels(x).
columns
The names of the columns to return. Valid column names are:
seqnames, start, end, width, strand,
RefSNP_id, alleles, snpClass, loctype.
See Details section below for a description of these columns.
drop.rs.prefix
Should the rs prefix be dropped from the returned RefSNP ids?
(RefSNP ids are stored in the RefSNP_id metadata column of the
returned object.)
as.DataFrame
Should the result be returned in a DataFrame instead
of a GRanges object?
ranges
One or more regions of interest specified as a GRanges
object. A single region of interest can be specified as a character string
of the form "ch14:5201-5300".
maxgap, minoverlap, type
These arguments are passed to subsetByOverlaps()
which is used internally by snpsByOverlaps.
See ?IRanges::subsetByOverlaps in the IRanges
package and ?GenomicRanges::subsetByOverlaps in
the GenomicRanges package for more information about the
subsetByOverlaps() generic and its method for
GenomicRanges objects.
ids
The RefSNP ids to look up (a.k.a. rs ids). Can be integer or
character vector, with or without the "rs" prefix. NAs are not
allowed.
ifnotfound
What to do if SNP ids are not found.
...
Additional arguments, for use in specific methods.
Further arguments passed to the snpsByOverlaps method for
XtraSNPlocs objects (thru ...) are passed to
subsetByOverlaps().
do.NULL, prefix
These arguments are ignored.
Value
snpcount returns a named integer vector containing the number
of SNPs for each chromosome in the reference genome.
snpsBySeqname and snpsById both return a
GRanges object with 1 element per SNP,
unless as.DataFrame is set to TRUE in which case they
return a DataFrame with 1 row per SNP.
When a GRanges object is returned, the columns
requested via the columns argument are stored as metada columns
of the object, except for the following columns: seqnames,
start, end, width, and strand.
These "spatial columns" (in the sense that they describe the genomic
locations of the SNPs) can be accessed by calling the corresponding
getter on the GRanges object.
Summary of available columns (my_snps being the returned object):
seqnames: The name of the chromosome where each SNP is
located. Access with seqnames(my_snps) when my_snps
is a GRanges object.
start and end: The starting and ending coordinates of
each SNP with respect to the chromosome indicated in seqnames.
Coordinated are 1-based and with respect to the 5' end of the plus
strand of the chromosome in the reference genome.
Access with start(my_snps), end(my_snps),
or ranges(my_snps) when my_snps is a
GRanges object.
width: The number of nucleotides spanned by each SNP
on the reference genome (e.g. a width of 0 means the SNP
is an insertion). Access with width( my_snps) when
my_snps is a GRanges object.
strand: The strand that the alleles of each SNP was reported
to. Access with strand(my_snps) when my_snps is a
GRanges object.
RefSNP_id: The RefSNP id (a.k.a. rs id) of each SNP.
Access with mcols(my_snps)$RefSNP_id when my_snps is a
GRanges object.
alleles: The alleles of each SNP in the format used by dbSNP.
Access with mcols(my_snps)$alleles when my_snps is a
GRanges object.
snpClass: Class of each SNP. Possible values are
in-del, heterozygous, microsatellite,
named-locus, no-variation, mixed, and
multinucleotide-polymorphism.
Access with mcols(my_snps)$snpClass when my_snps is a
GRanges object.
loctype: See ftp://ftp.ncbi.nih.gov/snp/00readme.txt
for the 6 loctype codes used by dbSNP, and their meanings.
WARNING: The code assigned to each SNP doesn't seem to be reliable.
For example, loctype codes 1 and 3 officially stand for insertion
and deletion, respectively. However, when looking at the SNP ranges
it actually seems to be the other way around.
Access with mcols(my_snps)$loctype when my_snps is a
GRanges object.
colnames(x) returns the names of the available columns.
Author(s)
H. Pag<c3><83><c2><a8>s
See Also
available.SNPs
SNPlocs objects.
Examples
library(XtraSNPlocs.Hsapiens.dbSNP141.GRCh38)
snps <- XtraSNPlocs.Hsapiens.dbSNP141.GRCh38
snpcount(snps)
colnames(snps)
## ---------------------------------------------------------------------
## snpsBySeqname()
## ---------------------------------------------------------------------
## Get the location, RefSNP id, and alleles for all "extra SNPs"
## located on chromosome 22 and MT:
snpsBySeqname(snps, c("ch22", "chMT"), columns=c("RefSNP_id", "alleles"))
## ---------------------------------------------------------------------
## snpsByOverlaps()
## ---------------------------------------------------------------------
## Get the location, RefSNP id, and alleles for all "extra SNPs"
## overlapping some regions of interest:
snpsByOverlaps(snps, "ch22:33.63e6-33.64e6",
columns=c("RefSNP_id", "alleles"))
## With the regions of interest being all the known CDS for hg38
## (except for the chromosome naming convention, hg38 is the same
## as GRCh38):
library(TxDb.Hsapiens.UCSC.hg38.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene
hg38_cds <- cds(txdb)
seqlevelsStyle(hg38_cds) # UCSC
seqlevelsStyle(snps) # dbSNP
seqlevelsStyle(hg38_cds) <- seqlevelsStyle(snps)
genome(hg38_cds) <- genome(snps)
snpsByOverlaps(snps, hg38_cds, columns=c("RefSNP_id", "alleles"))
## ---------------------------------------------------------------------
## snpsById()
## ---------------------------------------------------------------------
## Get the location and alleles for some RefSNP ids:
my_rsids <- c("rs367617508", "rs398104919", "rs3831697", "rs372470289",
"rs141568169", "rs34628976", "rs67551854")
snpsById(snps, my_rsids, c("RefSNP_id", "alleles"))
## See ?XtraSNPlocs.Hsapiens.dbSNP141.GRCh38 for examples of using
## snpsBySeqname() and snpsById().
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(BSgenome)
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following objects are masked from 'package:base':
colMeans, colSums, expand.grid, rowMeans, rowSums
Loading required package: IRanges
Loading required package: GenomeInfoDb
Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: XVector
Loading required package: rtracklayer
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/BSgenome/XtraSNPlocs-class.Rd_%03d_medium.png", width=480, height=480)
> ### Name: XtraSNPlocs-class
> ### Title: XtraSNPlocs objects
> ### Aliases: class:XtraSNPlocs XtraSNPlocs-class XtraSNPlocs
> ### colnames,XtraSNPlocs-method dim,XtraSNPlocs-method
> ### provider,XtraSNPlocs-method providerVersion,XtraSNPlocs-method
> ### releaseDate,XtraSNPlocs-method releaseName,XtraSNPlocs-method
> ### referenceGenome,XtraSNPlocs-method organism,XtraSNPlocs-method
> ### commonName,XtraSNPlocs-method species,XtraSNPlocs-method
> ### seqinfo,XtraSNPlocs-method seqnames,XtraSNPlocs-method newXtraSNPlocs
> ### show,XtraSNPlocs-method snpcount,XtraSNPlocs-method
> ### snpsBySeqname,XtraSNPlocs-method snpsByOverlaps,XtraSNPlocs-method
> ### snpsById,XtraSNPlocs-method
> ### Keywords: methods classes
>
> ### ** Examples
>
> library(XtraSNPlocs.Hsapiens.dbSNP141.GRCh38)
> snps <- XtraSNPlocs.Hsapiens.dbSNP141.GRCh38
> snpcount(snps)
ch1 ch2 ch3 ch4 ch5 ch6 ch7 ch8 ch9 ch10 ch11
546378 584439 464593 456879 411943 471931 379787 332813 277342 327169 315134
ch12 ch13 ch14 ch15 ch16 ch17 ch18 ch19 ch20 ch21 ch22
327687 238349 215014 199033 199383 202858 184432 167132 153060 100948 94680
chX chY chMT
226135 10298 46
> colnames(snps)
[1] "seqnames" "start" "end" "width" "strand" "RefSNP_id"
[7] "alleles" "snpClass" "loctype"
>
> ## ---------------------------------------------------------------------
> ## snpsBySeqname()
> ## ---------------------------------------------------------------------
> ## Get the location, RefSNP id, and alleles for all "extra SNPs"
> ## located on chromosome 22 and MT:
> snpsBySeqname(snps, c("ch22", "chMT"), columns=c("RefSNP_id", "alleles"))
GRanges object with 94726 ranges and 2 metadata columns:
seqnames ranges strand | RefSNP_id alleles
<Rle> <IRanges> <Rle> | <character> <character>
[1] ch22 [10513380, 10513380] - | rs386831164 -/T
[2] ch22 [10519678, 10519677] + | rs71286731 -/TTT
[3] ch22 [10526387, 10526386] + | rs36149894 -/A
[4] ch22 [10547411, 10547413] + | rs4022899 -/TTA
[5] ch22 [10551396, 10551396] - | rs67003614 -/T
... ... ... ... . ... ...
[94722] chMT [16180, 16181] + | rs371240719 -/AA
[94723] chMT [16187, 16189] + | rs386828865 CCT/TCC/TGC
[94724] chMT [16189, 16189] + | rs369574569 CC/T
[94725] chMT [16291, 16292] + | rs386828866 CC/TT
[94726] chMT [16293, 16294] + | rs386828867 AC/GT/TT
-------
seqinfo: 25 sequences (1 circular) from GRCh38 genome
>
> ## ---------------------------------------------------------------------
> ## snpsByOverlaps()
> ## ---------------------------------------------------------------------
> ## Get the location, RefSNP id, and alleles for all "extra SNPs"
> ## overlapping some regions of interest:
> snpsByOverlaps(snps, "ch22:33.63e6-33.64e6",
+ columns=c("RefSNP_id", "alleles"))
GRanges object with 32 ranges and 2 metadata columns:
seqnames ranges strand | RefSNP_id alleles
<Rle> <IRanges> <Rle> | <character> <character>
[1] ch22 [33630832, 33630832] + | rs11366195 -/A
[2] ch22 [33631155, 33631158] + | rs113414007 -/CTGT
[3] ch22 [33631157, 33631160] + | rs61016685 -/GTCT
[4] ch22 [33631157, 33631161] + | rs201902868 -/GTCTA
[5] ch22 [33631162, 33631162] + | rs376379301 -/T
... ... ... ... . ... ...
[28] ch22 [33637609, 33637608] + | rs3831697 -/CC/CT
[29] ch22 [33638237, 33638241] + | rs375674091 -/AAGGT
[30] ch22 [33638460, 33638459] + | rs147043500 -/ATTA
[31] ch22 [33638463, 33638462] + | rs10689394 -/ATTA
[32] ch22 [33638464, 33638463] + | rs67593517 -/ATTA
-------
seqinfo: 25 sequences (1 circular) from GRCh38 genome
>
> ## With the regions of interest being all the known CDS for hg38
> ## (except for the chromosome naming convention, hg38 is the same
> ## as GRCh38):
> library(TxDb.Hsapiens.UCSC.hg38.knownGene)
Loading required package: GenomicFeatures
Loading required package: AnnotationDbi
Loading required package: Biobase
Welcome to Bioconductor
Vignettes contain introductory material; view with
'browseVignettes()'. To cite Bioconductor, see
'citation("Biobase")', and for packages 'citation("pkgname")'.
> txdb <- TxDb.Hsapiens.UCSC.hg38.knownGene
> hg38_cds <- cds(txdb)
> seqlevelsStyle(hg38_cds) # UCSC
[1] "UCSC"
> seqlevelsStyle(snps) # dbSNP
[1] "dbSNP"
> seqlevelsStyle(hg38_cds) <- seqlevelsStyle(snps)
> genome(hg38_cds) <- genome(snps)
> snpsByOverlaps(snps, hg38_cds, columns=c("RefSNP_id", "alleles"))
GRanges object with 21511 ranges and 2 metadata columns:
seqnames ranges strand | RefSNP_id alleles
<Rle> <IRanges> <Rle> | <character> <character>
[1] ch1 [939437, 939436] + | rs200996316 -/T
[2] ch1 [946527, 946526] - | rs34516061 -/G
[3] ch1 [965122, 965121] + | rs140189896 -/GGGGGGGGG
[4] ch1 [966539, 966538] + | rs371900006 -/C
[5] ch1 [966592, 966591] + | rs372692701 -/T
... ... ... ... . ... ...
[21507] chX [155033410, 155033410] + | rs35561365 -/T
[21508] chX [155033422, 155033422] + | rs36000410 -/T
[21509] chX [155033446, 155033446] + | rs35784904 -/T
[21510] chX [155090809, 155090818] + | rs267606404 -/TCCAATCCAT
[21511] chY [ 19587278, 19587279] + | rs372084928 -/CT
-------
seqinfo: 25 sequences (1 circular) from GRCh38 genome
>
> ## ---------------------------------------------------------------------
> ## snpsById()
> ## ---------------------------------------------------------------------
> ## Get the location and alleles for some RefSNP ids:
> my_rsids <- c("rs367617508", "rs398104919", "rs3831697", "rs372470289",
+ "rs141568169", "rs34628976", "rs67551854")
> snpsById(snps, my_rsids, c("RefSNP_id", "alleles"))
GRanges object with 7 ranges and 2 metadata columns:
seqnames ranges strand | RefSNP_id alleles
<Rle> <IRanges> <Rle> | <character> <character>
[1] ch7 [148586457, 148586460] + | rs367617508 -/ACAG
[2] ch2 [160890468, 160890467] + | rs398104919 -/A
[3] ch22 [ 33637609, 33637608] + | rs3831697 -/CC/CT
[4] ch1 [ 93471610, 93471610] + | rs372470289 -/T
[5] ch18 [ 43189986, 43189986] + | rs141568169 -/A
[6] ch15 [ 45691525, 45691524] + | rs34628976 -/C
[7] ch14 [ 21488084, 21488083] + | rs67551854 -/AG
-------
seqinfo: 25 sequences (1 circular) from GRCh38 genome
>
> ## See ?XtraSNPlocs.Hsapiens.dbSNP141.GRCh38 for examples of using
> ## snpsBySeqname() and snpsById().
>
>
>
>
>
> dev.off()
null device
1
>