Last data update: 2014.03.03

R: Plotting functions
plotR Documentation

Plotting functions

Description

Functions for visualizing association test results by means of a Manhattan plot and for visualizing genotypes

Usage

## S4 method for signature 'AssocTestResultRanges,missing'
plot(x, y, cutoff=0.05,
     which=c("p.value", "p.value.adj", "p.value.resampled",
             "p.value.resampled.adj"), showEmpty=FALSE,
     as.dots=FALSE, pch=19, col=c("darkgrey", "grey"), scol="red",
     lcol="red", xlab=NULL, ylab=NULL, ylim=NULL, lwd=1, cex=1,
     cexXaxs=1, cexYaxs=1, srt=0, adj=c(0.5, 1), ...)
## S4 method for signature 'AssocTestResultRanges,character'
plot(x, y, cutoff=0.05,
     which=c("p.value", "p.value.adj", "p.value.resampled",
             "p.value.resampled.adj"), showEmpty=FALSE,
     as.dots=FALSE, pch=19, col=c("darkgrey", "grey"), scol="red",
     lcol="red", xlab=NULL, ylab=NULL, ylim=NULL, lwd=1, cex=1, 
     cexXaxs=1, cexYaxs=1, srt=0, adj=c(0.5, 1), ...)
## S4 method for signature 'AssocTestResultRanges,GRanges'
plot(x, y, cutoff=0.05,
     which=c("p.value", "p.value.adj", "p.value.resampled",
             "p.value.resampled.adj"), showEmpty=FALSE,
     as.dots=FALSE, pch=19, col="darkgrey", scol="red", lcol="red",
     xlab=NULL, ylab=NULL, ylim=NULL, lwd=1, cex=1,
     cexXaxs=1, cexYaxs=1, ...)
## S4 method for signature 'GenotypeMatrix,missing'
plot(x, y, col="black",
     labRow=NULL, labCol=NULL, cexXaxs=(0.2 + 1 / log10(ncol(x))),
     cexYaxs=(0.2 + 1 / log10(nrow(x))), srt=90, adj=c(1, 0.5))
## S4 method for signature 'GenotypeMatrix,factor'
plot(x, y, col=rainbow(length(levels(y))),
     labRow=NULL, labCol=NULL, cexXaxs=(0.2 + 1 / log10(ncol(x))),
     cexYaxs=(0.2 + 1 / log10(nrow(x))), srt=90, adj=c(1, 0.5))
## S4 method for signature 'GenotypeMatrix,numeric'
plot(x, y, col="black", ccol="red", lwd=2,
     labRow=NULL, labCol=NULL, cexXaxs=(0.2 + 1 / log10(ncol(x))),
     cexYaxs=(0.2 + 1 / log10(nrow(x))), srt=90, adj=c(1, 0.5))
## S4 method for signature 'GRanges,character'
plot(x, y, alongGenome=FALSE, 
     type=c("r", "s", "S", "l", "p", "b", "c", "h", "n"),
     xlab=NULL, ylab=NULL, col="red", lwd=2,
     cexXaxs=(0.2 + 1 / log10(length(x))), cexYaxs=1,
     frame.plot=TRUE, srt=90, adj=c(1, 0.5), ...)

Arguments

x

an object of class AssocTestResultRanges, GenotypeMatrix, or GRanges

y

a character string, GRanges object, or factor

cutoff

significance threshold

which

a character string specifying which p-values to plot; if “p.value” (default), raw p-values are plotted. Other options are “p.value.adj” (adjusted p-values), “p.value.resampled” (resampled p-values), and “p.value.resampled.adj” (adjusted resampled p-values). If the requested column is not present in the input object x, the function stops with an error message.

showEmpty

if FALSE (default), p-values of regions that did not contain any variants are omitted from the plot.

as.dots

if TRUE, p-values are plotted as dots/characters in the center of the genomic region. If FALSE (default), p-values are plotted as lines stretching from the starts to the ends of the corresponding genomic regions.

pch

plotting character used to plot a single p-value, ignored if as.dots=FALSE; see points for details.

col

plotting color(s); see details below

scol

color for plotting significant p-values (i.e. the ones passing the significance threshold)

lcol

color for plotting the significance threshold line

xlab

x axis label; if NULL (default) or NA, plot makes an automatic choice

ylab

y axis label; if NULL (default) or NA, plot makes an automatic choice

ylim

y axis limits; if NULL (default) or NA, plot makes an automatic choice; if user-specified, ylim must be a two-element numeric vector with the first element being 0 and the second element being a positive value.

lwd

line thickness; in Manhattan plots, this parameter corresponds to the thickness of the significance threshold line. When plotting genotype matrices along with continuous traits, this is the thickness of the line that corresponds to the trait.

cex

scaling factor for plotting p-values; see points for details.

labRow,labCol

row and column labels; set to NA to switch labels off; if NULL, rows are labeled by sample names (rownames(x)) and columns are labeled by variant names (names(variantInfo(x))).

cexXaxs,cexYaxs

scaling factors for axes labels

ccol

color of the line that plots the continuous trait along with a genotype matrix

srt

rotation angle of text labels on horizontal axis (see text for details); ignored if standard numerical ticks and labels are used.

adj

adjustment of text labels on horizontal axis (see text for details); ignored if standard numerical ticks and labels are used.

alongGenome

plot along the genome or per region (default); see details below.

type

type of plot; see plot.default for details. Additionally, the type “r” is available (default) which plots horizontal lines along the regions of x.

frame.plot

whether or not to frame the plotting area (see plot; default: TRUE)

...

all other arguments are passed to plot.

Details

If plot is called for an AssocTestResultRanges object without specifying the second argument y, a so-called Manhattan plot is produced. The x axis corresponds to the genome on which the AssocTestResultRanges x is based and the y axis shows absolute values of log-transformed p-values. The which argument determines which p-value is plotted, i.e. raw p-values, adjusted p-values, resampled p-values, or adjusted resampled p-values. The cutoff argument allows for setting a significance threshold above which p-values are plotted in a different color (see above).

The optional y argument can be used for two purposes: (1) if it is a character vector containing chromosome names (sequence names), it can be used for specifying a subset of one or more chromosomes to be plotted. (2) if y is a GRanges object of length 1 (if longer, plot stops with an error), only the genomic region corresponding to y is plotted.

The col argument serves for specifying the color for plotting insignificant p-values (i.e. the ones above the significance threshold); if the number of colors is smaller than the number of chromosomes, the vector is recycled. If col is a single color, all insignificant p-values are plotted in the same color. If col has two elements (like the default value), the insignificant p-values of different chromosomes are plotted with alternating colors. It is also possible to produce density plots of p-values by using semi-transparent colors (see, e.g., rgb or hsv for information on how to use the alpha channel).

If plot is called for a GenotypeMatrix object x and no y argument, the matrix is visualized in a heatmap-like fashion, where two major alleles are displayed in white, two minor alleles are displayed in the color passed as col argument, and the heterozygotous case (one minor, one major) is displayed in the color passed as col argument, but with 50% transparency. The arguments cexYaxs and cexXaxs can be used to change the scaling of the axis labels.

If plot is called for a GenotypeMatrix object x and a factor y, then the factor y is interpreted as a binary trait. In this case, the rows of the genotype matrix x are reordered such that rows/samples with the same label are plotted next to each other. Each such group can be plotted in a different color. For this purpose, a vector of colors can be passed as col argument.

If plot is called for a GenotypeMatrix object x and a numeric vector y, then the vector y is interpreted as a continuous trait. In this case, the rows of the genotype matrix x are reordered according to the trait vector y and the genotype matrix is plotted as described above. The trait y is superimposed in the plot in color ccol and with line width lwd. If the null model has been trained with covariates, it also makes sense to plot the genotype against the null model residuals, since these are exactly the values that the genotypes were tested against.

If plot is called for a GRanges object x and a character string y, then plot checks whether x has a metadata column named y. If it exists, this column is plotted against the regions in x. If alongGenome is FALSE (which is the default), the regions in x are arranged along the horizontal plot axis with equal widths and in the same order as contained in x. If the regions in x are named, then the names are used as axis labels and the argument cexXaxs can be used to scale the font size of the names. If alongGenome is TRUE, the metadata column is plotted against genomic positions. The knots of the curves are then positioned at the positions given in the GRanges object x. For types “s”, “S”, “l”, “p”, “b”, “c”, and “h”, knots are placed in the middle of the genomic regions contained in x if they are longer than one base. For type “r”, regions are plotted as lines exactly stretching between the start and end coordinates of each region in x.

Value

returns an invisible numeric vector of length 2 containing the y axis limits

Author(s)

Ulrich Bodenhofer bodenhofer@bioinf.jku.at

References

http://www.bioinf.jku.at/software/podkat

See Also

AssocTestResultRanges, GRanges

Examples

## load genome description
data(hgA)

## partition genome into overlapping windows
windows <- partitionRegions(hgA)

## load genotype data from VCF file
vcfFile <- system.file("examples/example1.vcf.gz", package="podkat")
Z <- readGenotypeMatrix(vcfFile)

## plot some fraction of the genotype matrix
plot(Z[, 1:25])

## read phenotype data from CSV file (continuous trait + covariates)
phenoFile <- system.file("examples/example1log.csv", package="podkat")
pheno <-read.table(phenoFile, header=TRUE, sep=",")

## train null model with all covariates in data frame 'pheno'
nm.log <- nullModel(y ~ ., pheno)

## perform association test
res <- assocTest(Z, nm.log, windows)
res.adj <- p.adjust(res)

## plot results
plot(res)
plot(res, cutoff=1.e-5, as.dots=TRUE)
plot(res.adj, which="p.value.adj")
plot(res.adj, reduce(windows[3:5]), which="p.value.adj")

## filter regions
res.adj.f <- filterResult(res.adj, filterBy="p.value.adj")

## plot genotype grouped by target
sel <- which(overlapsAny(variantInfo(Z), reduce(res.adj.f)))
plot(Z[, sel], factor(pheno$y))
plot(Z[, sel], residuals(nm.log), srt=45)

## compute contributions
contrib <- weights(res.adj.f, Z, nm.log)
contrib[[1]]

## plot contributions
plot(contrib[[1]], "weight.raw")
plot(contrib[[1]], "weight.contribution", type="b", alongGenome=TRUE)

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(podkat)
Loading required package: Rsamtools
Loading required package: GenomeInfoDb
Loading required package: stats4
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Loading required package: S4Vectors

Attaching package: 'S4Vectors'

The following objects are masked from 'package:base':

    colMeans, colSums, expand.grid, rowMeans, rowSums

Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: XVector
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/podkat/plot-methods.Rd_%03d_medium.png", width=480, height=480)
> ### Name: plot
> ### Title: Plotting functions
> ### Aliases: plot plot-methods plot,AssocTestResultRanges,missing-method
> ###   plot,AssocTestResultRanges,character-method
> ###   plot,AssocTestResultRanges,GRanges-method
> ###   plot,GenotypeMatrix,missing-method plot,GenotypeMatrix,factor-method
> ###   plot,GenotypeMatrix,numeric-method plot,GRanges,character-method
> ### Keywords: methods
> 
> ### ** Examples
> 
> ## load genome description
> data(hgA)
> 
> ## partition genome into overlapping windows
> windows <- partitionRegions(hgA)
> 
> ## load genotype data from VCF file
> vcfFile <- system.file("examples/example1.vcf.gz", package="podkat")
> Z <- readGenotypeMatrix(vcfFile)
> 
> ## plot some fraction of the genotype matrix
> plot(Z[, 1:25])
> 
> ## read phenotype data from CSV file (continuous trait + covariates)
> phenoFile <- system.file("examples/example1log.csv", package="podkat")
> pheno <-read.table(phenoFile, header=TRUE, sep=",")
> 
> ## train null model with all covariates in data frame 'pheno'
> nm.log <- nullModel(y ~ ., pheno)
small sample correction applied
> 
> ## perform association test
> res <- assocTest(Z, nm.log, windows)
> res.adj <- p.adjust(res)
> 
> ## plot results
> plot(res)
> plot(res, cutoff=1.e-5, as.dots=TRUE)
> plot(res.adj, which="p.value.adj")
> plot(res.adj, reduce(windows[3:5]), which="p.value.adj")
> 
> ## filter regions
> res.adj.f <- filterResult(res.adj, filterBy="p.value.adj")
> 
> ## plot genotype grouped by target
> sel <- which(overlapsAny(variantInfo(Z), reduce(res.adj.f)))
> plot(Z[, sel], factor(pheno$y))
> plot(Z[, sel], residuals(nm.log), srt=45)
> 
> ## compute contributions
> contrib <- weights(res.adj.f, Z, nm.log)
> contrib[[1]]
GRanges object with 31 ranges and 2 metadata columns:
          seqnames         ranges strand |        weight.raw
             <Rle>      <IRanges>  <Rle> |         <numeric>
  snv:160     chr1   [7713, 7713]      * | -6.26517762314755
  snv:164     chr1   [7834, 7834]      * | -7.47145138943176
  snv:166     chr1   [7932, 7932]      * |  -8.1742677399741
  snv:167     chr1   [7976, 7976]      * | -8.49700268877644
  snv:177     chr1   [8342, 8342]      * | -8.40629643967383
      ...      ...            ...    ... .               ...
  snv:249     chr1 [11888, 11888]      * |   66.962036514389
  snv:256     chr1 [12191, 12191]      * |  31.5349635964424
  snv:258     chr1 [12273, 12273]      * |  21.8922462931601
  snv:261     chr1 [12307, 12307]      * |  18.0223719598762
  snv:264     chr1 [12369, 12369]      * |  10.9659038227535
           weight.contribution
                     <numeric>
  snv:160 0.000511278661859377
  snv:164 0.000727111212932719
  snv:166 0.000870339328400399
  snv:167 0.000940421182884121
  snv:177 0.000920450193677578
      ...                  ...
  snv:249   0.0584047539126716
  snv:256   0.0129531549164573
  snv:258  0.00624268674043257
  snv:261    0.004230724925375
  snv:264  0.00156631734327101
  -------
  seqinfo: 1 sequence from hgA genome
> 
> ## plot contributions
> plot(contrib[[1]], "weight.raw")
> plot(contrib[[1]], "weight.contribution", type="b", alongGenome=TRUE)
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>