|
|||||||||||||||||||||||||||||||||
DetailsThis method provides different ways of constructing an object of
class If Positional information can be passed to the function in three different ways:
In all three cases, the lengths of the arguments If the arguments For all variants, filters in terms of missing values and MAFs can be
applied. Moreover, variants with MAFs greater than 0.5 can filtered
out or inverted. For details, see descriptions of parameters
For convenience, Valuereturns an object of class Author(s)Ulrich Bodenhofer bodenhofer@bioinf.jku.at Referenceshttp://www.bioinf.jku.at/software/podkat http://www.1000genomes.org/wiki/analysis/variant-call-format/vcf-variant-call-format-version-42 Obenchain, V., Lawrence, M., Carey, V., Gogarten, S., Shannon, P., and Morgan, M. (2014) VariantAnnotation: a Bioconductor package for exploration and annotation of genetic variants. Bioinformatics 30, 2076-2078. See Also
Examples## create a toy example A <- matrix(rbinom(50, 2, prob=0.2), 5, 10) sA <- as(A, "dgCMatrix") pos <- sort(sample(1:10000, ncol(A))) seqname <- "chr1" ## variant with 'GRanges' object gr <- GRanges(seqnames=seqname, ranges=IRanges(start=pos, width=1)) gtm <- genotypeMatrix(A, gr) gtm as.matrix(gtm) variantInfo(gtm) MAF(gtm) ## variant with 'pos' and 'seqnames' object genotypeMatrix(sA, pos, seqname) ## variant with 'seqname:pos' strings passed through 'pos' argument spos <- paste(seqname, pos, sep=":") spos genotypeMatrix(sA, spos) ## read data from VCF file using 'readVcf()' from the 'VariantAnnotation' ## package if (require(VariantAnnotation)) { vcfFile <- system.file("examples/example1.vcf.gz", package="podkat") sp <- ScanVcfParam(info=NA, genome="GT", fixed=c("ALT", "FILTER")) vcf <- readVcf(vcfFile, genome="hgA", param=sp) rowRanges(vcf) ## call constructor for 'VCF' object gtm <- genotypeMatrix(vcf) gtm variantInfo(gtm) ## alternatively, extract information from 'VCF' object and use ## variant with character matrix and 'GRanges' positions ## note that, in 'VCF' objects, rows correspond to variants and ## columns correspond to samples, therefore, we have to transpose the ## genotype gt <- t(geno(vcf)$GT) gt[1:5, 1:5] gr <- rowRanges(vcf) gtm <- genotypeMatrix(gt, gr) as.matrix(gtm[1:20, 1:5, recomputeMAF=TRUE]) } ResultsR version 3.3.1 (2016-06-21) -- "Bug in Your Hair" Copyright (C) 2016 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu (64-bit) R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type 'license()' or 'licence()' for distribution details. R is a collaborative project with many contributors. Type 'contributors()' for more information and 'citation()' on how to cite R or R packages in publications. Type 'demo()' for some demos, 'help()' for on-line help, or 'help.start()' for an HTML browser interface to help. Type 'q()' to quit R. > library(podkat) Loading required package: Rsamtools Loading required package: GenomeInfoDb Loading required package: stats4 Loading required package: BiocGenerics Loading required package: parallel Attaching package: 'BiocGenerics' The following objects are masked from 'package:parallel': clusterApply, clusterApplyLB, clusterCall, clusterEvalQ, clusterExport, clusterMap, parApply, parCapply, parLapply, parLapplyLB, parRapply, parSapply, parSapplyLB The following objects are masked from 'package:stats': IQR, mad, xtabs The following objects are masked from 'package:base': Filter, Find, Map, Position, Reduce, anyDuplicated, append, as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply, match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind, rownames, sapply, setdiff, sort, table, tapply, union, unique, unsplit Loading required package: S4Vectors Attaching package: 'S4Vectors' The following objects are masked from 'package:base': colMeans, colSums, expand.grid, rowMeans, rowSums Loading required package: IRanges Loading required package: GenomicRanges Loading required package: Biostrings Loading required package: XVector > png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/podkat/genotypeMatrix-methods.Rd_%03d_medium.png", width=480, height=480) > ### Name: genotypeMatrix-methods > ### Title: Constructors for Creating 'GenotypeMatrix' Objects > ### Aliases: genotypeMatrix-methods method:genotypeMatrix genotypeMatrix > ### genotypeMatrix,ANY,GRanges,missing-method > ### genotypeMatrix,ANY,numeric,character-method > ### genotypeMatrix,ANY,character,missing-method > ### genotypeMatrix,ANY,missing,missing-method > ### genotypeMatrix,eSet,numeric,character-method > ### genotypeMatrix,eSet,character,missing-method > ### genotypeMatrix,eSet,character,character-method > ### Keywords: methods > > ### ** Examples > > ## create a toy example > A <- matrix(rbinom(50, 2, prob=0.2), 5, 10) > sA <- as(A, "dgCMatrix") > pos <- sort(sample(1:10000, ncol(A))) > seqname <- "chr1" > > ## variant with 'GRanges' object > gr <- GRanges(seqnames=seqname, ranges=IRanges(start=pos, width=1)) > gtm <- genotypeMatrix(A, gr) > gtm Genotype matrix: Number of samples: 5 Number of variants: 10 Mean MAF: 0.23 Median MAF: 0.2 Minimum MAF: 0.1 Maximum MAF: 0.4 > as.matrix(gtm) [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [1,] 0 1 1 1 0 1 0 0 2 0 [2,] 1 2 0 0 0 0 0 1 0 0 [3,] 0 0 0 0 1 0 0 0 0 1 [4,] 0 1 0 1 0 2 0 1 1 1 [5,] 1 0 0 0 0 1 1 0 0 1 > variantInfo(gtm) VariantInfo object with 10 ranges and 1 metadata column: seqnames ranges strand | MAF <Rle> <IRanges> <Rle> | <numeric> [1] chr1 [2407, 2407] * | 0.2 [2] chr1 [2646, 2646] * | 0.4 [3] chr1 [2793, 2793] * | 0.1 [4] chr1 [4023, 4023] * | 0.2 [5] chr1 [4416, 4416] * | 0.1 [6] chr1 [4963, 4963] * | 0.4 [7] chr1 [6701, 6701] * | 0.1 [8] chr1 [6921, 6921] * | 0.2 [9] chr1 [8341, 8341] * | 0.3 [10] chr1 [9669, 9669] * | 0.3 ------- seqinfo: 1 sequence from an unspecified genome; no seqlengths > MAF(gtm) [1] 0.2 0.4 0.1 0.2 0.1 0.4 0.1 0.2 0.3 0.3 > > ## variant with 'pos' and 'seqnames' object > genotypeMatrix(sA, pos, seqname) Genotype matrix: Number of samples: 5 Number of variants: 10 Mean MAF: 0.23 Median MAF: 0.2 Minimum MAF: 0.1 Maximum MAF: 0.4 > > ## variant with 'seqname:pos' strings passed through 'pos' argument > spos <- paste(seqname, pos, sep=":") > spos [1] "chr1:2407" "chr1:2646" "chr1:2793" "chr1:4023" "chr1:4416" "chr1:4963" [7] "chr1:6701" "chr1:6921" "chr1:8341" "chr1:9669" > genotypeMatrix(sA, spos) Genotype matrix: Number of samples: 5 Number of variants: 10 Mean MAF: 0.23 Median MAF: 0.2 Minimum MAF: 0.1 Maximum MAF: 0.4 > > ## read data from VCF file using 'readVcf()' from the 'VariantAnnotation' > ## package > if (require(VariantAnnotation)) + { + vcfFile <- system.file("examples/example1.vcf.gz", package="podkat") + sp <- ScanVcfParam(info=NA, genome="GT", fixed=c("ALT", "FILTER")) + vcf <- readVcf(vcfFile, genome="hgA", param=sp) + rowRanges(vcf) + + ## call constructor for 'VCF' object + gtm <- genotypeMatrix(vcf) + gtm + variantInfo(gtm) + + ## alternatively, extract information from 'VCF' object and use + ## variant with character matrix and 'GRanges' positions + ## note that, in 'VCF' objects, rows correspond to variants and + ## columns correspond to samples, therefore, we have to transpose the + ## genotype + gt <- t(geno(vcf)$GT) + gt[1:5, 1:5] + gr <- rowRanges(vcf) + gtm <- genotypeMatrix(gt, gr) + as.matrix(gtm[1:20, 1:5, recomputeMAF=TRUE]) + } Loading required package: VariantAnnotation Loading required package: SummarizedExperiment Loading required package: Biobase Welcome to Bioconductor Vignettes contain introductory material; view with 'browseVignettes()'. To cite Bioconductor, see 'citation("Biobase")', and for packages 'citation("pkgname")'. Attaching package: 'VariantAnnotation' The following object is masked from 'package:base': tabulate snv:6 snv:7 S1 0 0 S2 0 0 S3 0 0 S4 0 0 S5 0 0 S6 0 1 S7 0 0 S8 1 0 S9 0 0 S10 0 0 S11 0 0 S12 0 0 S13 0 0 S14 0 0 S15 1 0 S16 0 0 S17 0 0 S18 0 0 S19 1 0 S20 0 1 > > > > > dev.off() null device 1 > |
|||||||||||||||||||||||||||||||||
|