R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Construct a waterfall plot

waterfall

R Documentation

Construct a waterfall plot

Description

Given a data frame construct a water fall plot showing the mutation burden and mutation type on a gene and sample level.

Usage

waterfall(x, mainRecurCutoff = 0, mainGrid = TRUE, mainXlabel = FALSE,
  main_geneLabSize = 8, mainLabelCol = NULL, mainLabelSize = 4,
  mainLabelAngle = 0, mainDropMut = FALSE, mainPalette = NULL,
  mainLayer = NULL, mutBurden = NULL, plotMutBurden = TRUE,
  coverageSpace = 44100000, mutBurdenLayer = NULL, clinData = NULL,
  clinLegCol = 1, clinVarOrder = NULL, clinVarCol = NULL,
  clinLayer = NULL, sampRecurLayer = NULL, plotGenes = NULL,
  geneOrder = NULL, plotSamples = NULL, sampOrder = NULL,
  maxGenes = NULL, rmvSilent = FALSE, fileType = "MAF",
  variant_class_order = NULL, out = "plot")

Arguments

`x`	Object of class data frame representing annotated mutations. The data frame supplied must have one of the following sets of column names ("Tumor_Sample_Barcode", "Hugo_Symbol", "Variant_Classification") for fileType="MAF", ("sample","gene_name","trv_type") for fileType="MGI" or ("sample", "gene", "variant_class") for fileType="Custom". This columns should represent samples in a cohort, gene with mutation, and the mutation type respectively.
`mainRecurCutoff`	Numeric value between 0 and 1 specifying a mutation recurrence cutoff. Genes which do not have mutations in the proportion os samples defined are removed.
`mainGrid`	Boolean specifying if a grid should be overlayed on the main plot. Not recommended if the number of genes or samples to be plotted is large.
`mainXlabel`	Boolean specifying whether to label the x-axis with sample names. Not recommended if the number of samples to be plotted is large.
`main_geneLabSize`	Intenger specifying the size of gene names displayed on the y-axis.
`mainLabelCol`	Character string specifying a column name from the argument supplied to parameter 'x' from which to derive cell labels from (see details and vignette).
`mainLabelSize`	Integer specifying the size of text labels for cells in the main plot. Valid only if argument is supplied to the parameter 'mainLabelCol'.
`mainLabelAngle`	Integer specifying the degree of rotation for text labels. Valid only if argument is supplied to the parameter 'mainLabelCol'.
`mainDropMut`	Boolean specifying whether to drop unused "mutation type" levels from the legend.
`mainPalette`	Character vector specifying colours for mutation types plotted in the main plot, must specify a colour for each mutation type plotted.
`mainLayer`	Valid ggplot2 layer to be added to the main plot.
`mutBurden`	Object of class data frame containing columns "sample", "mut_burden" with sample levels matching those supplied in x.
`plotMutBurden`	Boolean specify if the mutation burden sub-plot should be displayed.
`coverageSpace`	Integer specifying the size in bp of the genome covered by sequence data from which mutations could be called (see details and vignette).
`mutBurdenLayer`	Valid ggplot2 layer to be added to the top sub-plot.
`clinData`	Object of class data frame with rows representing clinical data. The data frame should be in "long format" and columns must be names as "sample", "variable", and "value" (optional see details and vignette).
`clinLegCol`	Integer specifying the number of columns in the legend for the clinical data, only valid if argument is supplied to parameter clinData.
`clinVarOrder`	Character vector specifying the order in which to plot variables in the variable column of the argument given to the parameter clinData. The argument supplied to this parameter should have the same unique length and values as in the variable column of the argument supplied to parameter clinData (see vignette).
`clinVarCol`	Named character vector specifying the mapping of colours to variables in the variable column of the data frame supplied to clinData (ex. "variable"="colour").
`clinLayer`	Valid ggplot2 layer to be added to the clinical sub-plot.
`sampRecurLayer`	Valid ggplot2 layer to be added to the left sub-plot.
`plotGenes`	Character vector specifying genes to plot. If not null genes not specified within this character vector are removed.
`geneOrder`	Character vector specifying the order in which to plot genes.
`plotSamples`	Character vector specifying samples to plot. If not null all other samples not specified within this parameter are removed.
`sampOrder`	Character vector specifying the order of the samples to plot.
`maxGenes`	Integer specifying the maximum number of genes to be plotted. Genes kept will be choosen based on the reccurence of mutations in samples.
`rmvSilent`	Boolean specifying if silent mutations should be removed from the plot.
`fileType`	Character string specifying the file format of the data frame specified to parameter 'x', one of "MGI", "MAF", "Custom" (see details and vignette).
`variant_class_order`	Character vector specifying the hierarchical order of mutation types to plot, required if file_type == "Custom" (see details and vignette).
`out`	Character vector specifying the the object to output, one of "data", "grob", or "plot", defaults to "plot" (see returns).

Details

waterfall is a function designed to visualize the mutations seen in a cohort. The function takes a data frame with appropriate column names (see fileType parameter) and plots the mutations within. In cases where multiple mutations occur in the same cell the most deleterious mutation is given priority (see vignette for default priority). If the fileType parameter is set to "Custom" the user most supply this priority via the 'variant_class_order' parameter with the highest priorities occuring first. Additionally this parameter will override the default orders of MGI and MAF file types.

Various data subsets are allowed via the waterfall function (see above), all of these subsets will occur independently of the mutation burden calculation. To clarify the removal of genes and mutations will only occur after the mutation burden is calculated. The mutation burden calculation is only meant to provide a rough estimate and assumes that the coverage breadth within the cohort is aproximately equal. For more accurate calculations it is recommended to supply this information via the mutBurden parameter which. Note that the mutation burden calculation relies on the 'coverageSpace' parameter (see vignette).

It is possible to display additional information within the plot via cell labels. The 'mainLabelCol' parameter will look for an additional column in the data frame and plot text within cells based on those values (see vignette).

Value

One of the following, a list of dataframes containing data to be plotted, a grob object, or a plot.

Examples

# Plot the data
waterfall(brcaMAF, plotGenes=c("PIK3CA", "TP53", "USH2A", "MLL3", "BRCA1"))

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(GenVisR)
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/GenVisR/waterfall.Rd_%03d_medium.png", width=480, height=480)
> ### Name: waterfall
> ### Title: Construct a waterfall plot
> ### Aliases: waterfall
> 
> ### ** Examples
> 
> # Plot the data
> waterfall(brcaMAF, plotGenes=c("PIK3CA", "TP53", "USH2A", "MLL3", "BRCA1"))
Checking if input is properly formatted...
Calculating frequency of mutations...
Removing genes not in: PIK3CA, TP53, USH2A, MLL3, BRCA1
setting mutation hierarchy...
NULL
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>