Last data update: 2014.03.03

R: Class GdsGenotypeReader
GdsGenotypeReaderR Documentation

Class GdsGenotypeReader

Description

The GdsGenotypeReader class is an extension of the GdsReader class specific to reading genotype data stored in GDS files. GDS files with both snp x scan and scan x snp dimensions are supported.

Extends

GdsReader

Constructor

GdsGenotypeReader(filename, genotypeDim):

filename must be the path to a GDS file or a gds object. The GDS file must contain the following variables:

  • 'snp.id': a unique integer vector of snp ids

  • 'snp.chromosome': integer chromosome codes

  • 'snp.position': integer position values

  • 'sample.id': a unique integer vector of scan ids

  • 'genotype': a matrix of bytes with dimensions ('snp','sample'). The byte values must be the number of A alleles : 2=AA, 1=AB, 0=BB.

The optional variable "snp.allele" stores the A and B alleles in a character vector with format "A/B".

Default values for chromosome codes are 1-22=autosome, 23=X, 24=XY, 25=Y, 26=M. The defaults may be changed with the arguments autosomeCode, XchromCode, XYchromCode, YchromCode, and MchromCode.

The constructor automatically detects whether the GDS file is in snp x scan or scan x snp order using the dimensions of snp.id and sample.id. In the case of GDS files with equal SNP and scan dimensions, genotypeDim is a required input to the function and can take values "snp,scan" or "scan,snp".

The GdsGenotypeReader constructor creates and returns a GdsGenotypeReader instance pointing to this file.

Accessors

In the code snippets below, object is a GdsGenotypeReader object. See GdsReader for additional methods.

nsnp(object): The number of SNPs in the GDS file.

nscan(object): The number of scans in the GDS file.

getSnpID(object, index): A unique integer vector of snp IDs. The optional index is a logical or integer vector specifying elements to extract.

getChromosome(object, index, char=FALSE): A vector of chromosomes. The optional index is a logical or integer vector specifying elements to extract. If char=FALSE (default), returns an integer vector. If char=TRUE, returns a character vector with elements in (1:22,X,XY,Y,M,U). "U" stands for "Unknown" and is the value given to any chromosome code not falling in the other categories.

getPosition(object, index): An integer vector of base pair positions. The optional index is a logical or integer vector specifying elements to extract.

getAlleleA(object, index): A character vector of A alleles. The optional index is a logical or integer vector specifying elements to extract.

getAlleleB(object, index): A character vector of B alleles. The optional index is a logical or integer vector specifying elements to extract.

getScanID(object, index): A unique integer vector of scan IDs. The optional index is a logical or integer vector specifying elements to extract.

getGenotype(object, snp=c(1,-1), scan=c(1,-1), drop=TRUE, use.names=FALSE, order=c("file", "selection"), transpose=FALSE, ...): Extracts genotype values (number of A alleles). snp and scan indicate which elements to return along the snp and scan dimensions. They must be integer vectors of the form (start, count), where start is the index of the first data element to read and count is the number of elements to read. A value of '-1' for count indicates that the entire dimension should be read. If drop=TRUE, the result is coerced to the lowest possible dimension. If use.names=TRUE, names of the resulting vector or matrix are set to the SNP and scan IDs. Missing values are represented as NA. If order=="file", genotypes are returned in the order they are stored in the file. If order="selection", the order of SNPs and scans will match the index selection provided in snp and scan respectively. Genotypes are returned in SNP x scan order if transpose=FALSE, otherwise they are returned in scan x SNP order.

getGenotypeSelection(object, snp=NULL, scan=NULL, snpID=NULL, scanID=NULL, drop=TRUE, use.names=TRUE, transpose=FALSE, ...): Extracts genotype values (number of A alleles). snp and scan may be integer or logical vectors indicating which elements to return along the snp and scan dimensions. snpID and scanID allow section by values of snpID and scanID. Unlike getGenotype, the values requested need not be in contiguous blocks. Other arguments are identical to getGenotype.

getVariable(object, varname, index, drop=TRUE, ...): Extracts the contents of the variable varname. The optional index is a logical or integer vector (if varname is 1D) or list (if varname is 2D or higher) specifying elements to extract. If drop=TRUE, the result is coerced to the lowest possible dimension. Missing values are represented as NA. If the variable is not found, returns NULL.

XchromCode(object): Returns the integer code for the X chromosome.

XYchromCode(object): Returns the integer code for the pseudoautosomal region.

YchromCode(object): Returns the integer code for the Y chromosome.

MchromCode(object): Returns the integer code for mitochondrial SNPs.

Author(s)

Stephanie Gogarten

See Also

GdsReader, GenotypeData

Examples

file <- system.file("extdata", "illumina_geno.gds", package="GWASdata")
gds <- GdsGenotypeReader(file)

# dimensions
nsnp(gds)
nscan(gds)

# get snpID and chromosome
snpID <- getSnpID(gds)
chrom <- getChromosome(gds)

# get positions only for chromosome 22
pos22 <- getPosition(gds, index=(chrom == 22))

# get all snps for first scan
geno <- getGenotype(gds, snp=c(1,-1), scan=c(1,1))
length(geno)

# starting at snp 100, get 10 snps for the first 5 scans
getGenotype(gds, snp=c(100,10), scan=c(1,5))

# get snps 1-10, 25-30 for scans 3,5,7
snp.index <- c(1:10, 25:30)
scan.index <- c(3,5,7)
getGenotypeSelection(gds, snp=snp.index, scan=scan.index)

# illustrate drop argument
getGenotypeSelection(gds, snp=5, scan=1:10, drop=TRUE, use.names=FALSE)
getGenotypeSelection(gds, snp=5, scan=1:10, drop=FALSE, use.names=FALSE)

# illustrate order="file" vs order="selection"
snp.index <- c(9,3,5)
scan.index <- c(3,2,1)
getGenotypeSelection(gds, snp=snp.index, scan=scan.index, order="file")
getGenotypeSelection(gds, snp=snp.index, scan=scan.index, order="selection")

close(gds)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(GWASTools)
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/GWASTools/GdsGenotypeReader-class.Rd_%03d_medium.png", width=480, height=480)
> ### Name: GdsGenotypeReader
> ### Title: Class GdsGenotypeReader
> ### Aliases: GdsGenotypeReader-class GdsGenotypeReader
> ###   getVariable,GdsGenotypeReader-method
> ###   getSnpID,GdsGenotypeReader-method
> ###   getChromosome,GdsGenotypeReader-method
> ###   getPosition,GdsGenotypeReader-method
> ###   getAlleleA,GdsGenotypeReader-method
> ###   getAlleleB,GdsGenotypeReader-method
> ###   getScanID,GdsGenotypeReader-method
> ###   getGenotype,GdsGenotypeReader-method
> ###   getGenotypeSelection,GdsGenotypeReader-method
> ###   nsnp,GdsGenotypeReader-method nscan,GdsGenotypeReader-method
> ###   autosomeCode,GdsGenotypeReader-method
> ###   XchromCode,GdsGenotypeReader-method
> ###   XYchromCode,GdsGenotypeReader-method
> ###   YchromCode,GdsGenotypeReader-method
> ###   MchromCode,GdsGenotypeReader-method
> ### Keywords: methods classes
> 
> ### ** Examples
> 
> file <- system.file("extdata", "illumina_geno.gds", package="GWASdata")
> gds <- GdsGenotypeReader(file)
> 
> # dimensions
> nsnp(gds)
[1] 3300
> nscan(gds)
[1] 77
> 
> # get snpID and chromosome
> snpID <- getSnpID(gds)
> chrom <- getChromosome(gds)
> 
> # get positions only for chromosome 22
> pos22 <- getPosition(gds, index=(chrom == 22))
> 
> # get all snps for first scan
> geno <- getGenotype(gds, snp=c(1,-1), scan=c(1,1))
> length(geno)
[1] 3300
> 
> # starting at snp 100, get 10 snps for the first 5 scans
> getGenotype(gds, snp=c(100,10), scan=c(1,5))
      [,1] [,2] [,3] [,4] [,5]
 [1,]    0    0    1    1    1
 [2,]    0    0    0    0    0
 [3,]    0    0    0    0    0
 [4,]    1    1    1    1    1
 [5,]    0    0    0    0    0
 [6,]    1    1    0    0    0
 [7,]    1    1    1    1    0
 [8,]    0    0    0    0    0
 [9,]    2    2    1    1    0
[10,]    0    0    0    0    0
> 
> # get snps 1-10, 25-30 for scans 3,5,7
> snp.index <- c(1:10, 25:30)
> scan.index <- c(3,5,7)
> getGenotypeSelection(gds, snp=snp.index, scan=scan.index)
       282 284 286
999447  NA  NA  NA
999465   0   0   0
999493   0   1   0
999512   1   0   1
999561   0   0   0
999567   2   1   2
999569   0   0   2
999577   0   0   2
999578   2   1   2
999580   0   1   0
999853   0   0   0
999858   0   0   0
999860   2   2   2
999893   2   2   2
999902   1   1   0
999914   0   0   0
> 
> # illustrate drop argument
> getGenotypeSelection(gds, snp=5, scan=1:10, drop=TRUE, use.names=FALSE)
 [1] 0 0 0 0 0 0 0 0 1 0
> getGenotypeSelection(gds, snp=5, scan=1:10, drop=FALSE, use.names=FALSE)
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    0    0    0    0    0    0    0    0    1     0
> 
> # illustrate order="file" vs order="selection"
> snp.index <- c(9,3,5)
> scan.index <- c(3,2,1)
> getGenotypeSelection(gds, snp=snp.index, scan=scan.index, order="file")
       280 281 282
999493   0   0   0
999561   0   0   0
999578   2   2   2
> getGenotypeSelection(gds, snp=snp.index, scan=scan.index, order="selection")
       282 281 280
999561   0   0   0
999578   2   2   2
999493   0   0   0
> 
> close(gds)
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>