R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Plot PCA for an SCESet object

plotPCA

R Documentation

Plot PCA for an SCESet object

Description

Produce a principal components analysis (PCA) plot of two or more principal components for an SCESet dataset.

Usage

plotPCASCESet(object, ntop = 500, ncomponents = 2, exprs_values = "exprs",
  colour_by = NULL, shape_by = NULL, size_by = NULL, feature_set = NULL,
  return_SCESet = FALSE, scale_features = TRUE, draw_plot = TRUE,
  pca_data_input = "exprs", selected_variables = NULL,
  detect_outliers = FALSE, theme_size = 10, legend = "auto")

## S4 method for signature 'SCESet'
plotPCA(object, ntop = 500, ncomponents = 2,
  exprs_values = "exprs", colour_by = NULL, shape_by = NULL,
  size_by = NULL, feature_set = NULL, return_SCESet = FALSE,
  scale_features = TRUE, draw_plot = TRUE, pca_data_input = "exprs",
  selected_variables = NULL, detect_outliers = FALSE, theme_size = 10,
  legend = "auto")

Arguments

`object`	an `SCESet` object
`ntop`	numeric scalar indicating the number of most variable features to use for the PCA. Default is `500`, but any `ntop` argument is overrided if the `feature_set` argument is non-NULL.
`ncomponents`	numeric scalar indicating the number of principal components to plot, starting from the first principal component. Default is 2. If `ncomponents` is 2, then a scatterplot of PC2 vs PC1 is produced. If `ncomponents` is greater than 2, a pairs plots for the top components is produced.
`exprs_values`	character string indicating which values should be used as the expression values for this plot. Valid arguments are `"tpm"` (default; transcripts per million), `"norm_tpm"` (normalised TPM values), `"fpkm"` (FPKM values), `"norm_fpkm"` (normalised FPKM values), `"counts"` (counts for each feature), `"norm_counts"`, `"cpm"` (counts-per-million), `"norm_cpm"` (normalised counts-per-million), `"exprs"` (whatever is in the `'exprs'` slot of the `SCESet` object; default), `"norm_exprs"` (normalised expression values) or `"stand_exprs"` (standardised expression values) or any other named element of the `assayData` slot of the `SCESet` object that can be accessed with the `get_exprs` function.
`colour_by`	character string defining the column of `pData(object)` to be used as a factor by which to colour the points in the plot.
`shape_by`	character string defining the column of `pData(object)` to be used as a factor by which to define the shape of the points in the plot.
`size_by`	character string defining the column of `pData(object)` to be used as a factor by which to define the size of points in the plot.
`feature_set`	character, numeric or logical vector indicating a set of features to use for the PCA. If character, entries must all be in `featureNames(object)`. If numeric, values are taken to be indices for features. If logical, vector is used to index features and should have length equal to `nrow(object)`.
`return_SCESet`	logical, should the function return an `SCESet` object with principal component values for cells in the `reducedDimension` slot. Default is `FALSE`, in which case a `ggplot` object is returned.
`scale_features`	logical, should the expression values be standardised so that each feature has unit variance? Default is `TRUE`.
`draw_plot`	logical, should the plot be drawn on the current graphics device? Only used if `return_SCESet` is `TRUE`, otherwise the plot is always produced.
`pca_data_input`	character argument defining which data should be used as input for the PCA. Possible options are `"exprs"` (default), which uses expression data to produce a PCA at the cell level; `"pdata"` which uses numeric variables from `pData(object)` to do PCA at the cell level; and `"fdata"` which uses numeric variables from `fData(object)` to do PCA at the feature level.
`selected_variables`	character vector indicating which variables in `pData(object)` to use for the phenotype-data based PCA. Ignored if the argument `pca_data_input` is anything other than `"pdata"`.
`detect_outliers`	logical, should outliers be detected in the PC plot? Only an option when `pca_data_input` argument is `"pdata"`. Default is `FALSE`.
`theme_size`	numeric scalar giving default font size for plotting theme (default is 10).
`legend`	character, specifying how the legend(s) be shown? Default is `"auto"`, which hides legends that have only one level and shows others. Alternatives are "all" (show all legends) or "none" (hide all legends).
`...`	further arguments passed to `plotPCASCESet`

Details

The function prcomp is used internally to do the PCA. The function checks whether the object has standardised expression values (by looking at stand_exprs(object)). If yes, the existing standardised expression values are used for the PCA. If not, then standardised expression values are computed using scale (with feature-wise unit variances or not according to the scale_features argument), added to the object and PCA is done using these new standardised expression values.

If the arguments detect_outliers and return_SCESet are both TRUE, then the element $outlier is added to the pData (phenotype data) slot of the SCESet object. This element contains indicator values about whether or not each cell has been designated as an outlier based on the PCA. These values can be accessed for filtering low quality cells with, foe example, example_sceset$outlier.

Value

either a ggplot plot object or an SCESet object

Examples

## Set up an example SCESet
data("sc_example_counts")
data("sc_example_cell_info")
pd <- new("AnnotatedDataFrame", data = sc_example_cell_info)
example_sceset <- newSCESet(countData = sc_example_counts, phenoData = pd)
drop_genes <- apply(exprs(example_sceset), 1, function(x) {var(x) == 0})
example_sceset <- example_sceset[!drop_genes, ]

## Examples plotting PC1 and PC2
plotPCA(example_sceset)
plotPCA(example_sceset, colour_by = "Cell_Cycle")
plotPCA(example_sceset, colour_by = "Cell_Cycle", shape_by = "Treatment")
plotPCA(example_sceset, colour_by = "Cell_Cycle", shape_by = "Treatment",
size_by = "Mutation_Status")
plotPCA(example_sceset, shape_by = "Treatment", size_by = "Mutation_Status")
plotPCA(example_sceset, feature_set = 1:100, colour_by = "Treatment",
shape_by = "Mutation_Status")

## experiment with legend
example_subset <- example_sceset[, example_sceset$Treatment == "treat1"]
plotPCA(example_subset, colour_by = "Cell_Cycle", shape_by = "Treatment", legend = "all")

plotPCA(example_sceset, shape_by = "Treatment", return_SCESet = TRUE)

## Examples plotting more than 2 PCs
plotPCA(example_sceset, ncomponents = 8)
plotPCA(example_sceset, ncomponents = 4, colour_by = "Treatment",
shape_by = "Mutation_Status")

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(scater)
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: ggplot2

Attaching package: 'scater'

The following object is masked from 'package:stats':

    filter

> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/scater/plotPCA.Rd_%03d_medium.png", width=480, height=480)
> ### Name: plotPCA
> ### Title: Plot PCA for an SCESet object
> ### Aliases: plotPCA plotPCA,SCESet-method plotPCASCESet
> 
> ### ** Examples
> 
> ## Set up an example SCESet
> data("sc_example_counts")
> data("sc_example_cell_info")
> pd <- new("AnnotatedDataFrame", data = sc_example_cell_info)
> example_sceset <- newSCESet(countData = sc_example_counts, phenoData = pd)
> drop_genes <- apply(exprs(example_sceset), 1, function(x) {var(x) == 0})
> example_sceset <- example_sceset[!drop_genes, ]
> 
> ## Examples plotting PC1 and PC2
> plotPCA(example_sceset)
> plotPCA(example_sceset, colour_by = "Cell_Cycle")
> plotPCA(example_sceset, colour_by = "Cell_Cycle", shape_by = "Treatment")
> plotPCA(example_sceset, colour_by = "Cell_Cycle", shape_by = "Treatment",
+ size_by = "Mutation_Status")
Warning message:
Using size for a discrete variable is not advised. 
> plotPCA(example_sceset, shape_by = "Treatment", size_by = "Mutation_Status")
Warning message:
Using size for a discrete variable is not advised. 
> plotPCA(example_sceset, feature_set = 1:100, colour_by = "Treatment",
+ shape_by = "Mutation_Status")
> 
> ## experiment with legend
> example_subset <- example_sceset[, example_sceset$Treatment == "treat1"]
> plotPCA(example_subset, colour_by = "Cell_Cycle", shape_by = "Treatment", legend = "all")
> 
> plotPCA(example_sceset, shape_by = "Treatment", return_SCESet = TRUE)
SCESet (storageMode: environment)
assayData: 1973 features, 40 samples 
  element names: counts, cpm, exprs, is_exprs 
protocolData: none
phenoData
  sampleNames: Cell_001 Cell_002 ... Cell_040 (40 total)
  varLabels: Cell Mutation_Status Cell_Cycle Treatment
  varMetadata: labelDescription
featureData: none
experimentData: use 'experimentData(object)'
Annotation:  
> 
> ## Examples plotting more than 2 PCs
> plotPCA(example_sceset, ncomponents = 8)
> plotPCA(example_sceset, ncomponents = 4, colour_by = "Treatment",
+ shape_by = "Mutation_Status")
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>