R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Get Frequencies of Feature Selection and Sample Errors

distribution

R Documentation

Get Frequencies of Feature Selection and Sample Errors

Description

There are two modes. For aggregating feature selection results, the function counts the number of times each feature was selected in all cross validations. For aggregating classification results, the error rate for each sample is calculated. This is useful in identifying outlier samples that are difficult to classify.

Usage

  ## S4 method for signature 'ClassifyResult'
distribution(result, dataType = c("features", "samples"),
                   plotType = c("density", "histogram"), summaryType = c("percentage", "count"),
                   plot = TRUE, xMax = NULL, xLabel = "Percentage of Cross-validations",
                   yLabel = "Density", title = "Distribution of Feature Selections",
                   fontSizes = c(24, 16, 12), ...)

Arguments

`result`	An object of class `ClassifyResult`.
`dataType`	Whether to calculate sample-wise error rate or the number of times a feature was selected.
`plotType`	Whether to draw a probability density curve or a histogram.
`summaryType`	Whether to summarise the feature selections as a percentage or count.
`plot`	Whether to draw a plot of the frequency of selection or error rate.
`xMax`	Maximum data value to show in plot.
`xLabel`	The label for the x-axis of the plot.
`yLabel`	The label for the y-axis of the plot.
`title`	An overall title for the plot.
`fontSizes`	A vector of length 3. The first number is the size of the title. The second number is the size of the axes titles. The third number is the size of the axes values.
`...`	Further parameters, such as `colour` and `fill`, passed to `geom_histogram` or `stat_density`, depending on the value of `plotType`.

Value

If type is "features", a vector as long as the number of features that were chosen at least once containing the number of times the feature was chosen in cross validations or the percentage of times chosen. If type is "samples", a vector as long as the number of samples, containing the cross-validation error rate of the sample. If plot is TRUE, then a plot is also made on the current graphics device.

Author(s)

Dario Strbenac

Examples

  if(require(curatedOvarianData) && require(sparsediscrim))
  {
    data(TCGA_eset)
    badOutcome <- which(pData(TCGA_eset)[, "vital_status"] == "deceased" & pData(TCGA_eset)[, "days_to_death"] <= 365)
    goodOutcome <- which(pData(TCGA_eset)[, "vital_status"] == "living" & pData(TCGA_eset)[, "days_to_death"] >= 365 * 5)
    TCGA_eset <- TCGA_eset[, c(badOutcome, goodOutcome)]
    classes <- factor(rep(c("Poor", "Good"), c(length(badOutcome), length(goodOutcome))))
    pData(TCGA_eset)[, "class"] <- classes
    result <- runTests(TCGA_eset, "Ovarian Cancer", "Differential Expression", resamples = 2, fold = 2)
    sampleDistribution <- distribution(result, "samples", xLabel = "Sample Error Rate",
                                       title = "Distribution of Error Rates")
    featureDistribution <- distribution(result, "features", summaryType = "count", plotType = "histogram",
                                        xLabel = "Number of Cross-validations", yLabel = "Count",
                                        binwidth = 1)
    print(head(sampleDistribution))
    print(head(featureDistribution))
  }

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(ClassifyR)
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: BiocParallel
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/ClassifyR/distribution.Rd_%03d_medium.png", width=480, height=480)
> ### Name: distribution
> ### Title: Get Frequencies of Feature Selection and Sample Errors
> ### Aliases: distribution distribution,ClassifyResult-method
> 
> ### ** Examples
> 
>   if(require(curatedOvarianData) && require(sparsediscrim))
+   {
+     data(TCGA_eset)
+     badOutcome <- which(pData(TCGA_eset)[, "vital_status"] == "deceased" & pData(TCGA_eset)[, "days_to_death"] <= 365)
+     goodOutcome <- which(pData(TCGA_eset)[, "vital_status"] == "living" & pData(TCGA_eset)[, "days_to_death"] >= 365 * 5)
+     TCGA_eset <- TCGA_eset[, c(badOutcome, goodOutcome)]
+     classes <- factor(rep(c("Poor", "Good"), c(length(badOutcome), length(goodOutcome))))
+     pData(TCGA_eset)[, "class"] <- classes
+     result <- runTests(TCGA_eset, "Ovarian Cancer", "Differential Expression", resamples = 2, fold = 2)
+     sampleDistribution <- distribution(result, "samples", xLabel = "Sample Error Rate",
+                                        title = "Distribution of Error Rates")
+     featureDistribution <- distribution(result, "features", summaryType = "count", plotType = "histogram",
+                                         xLabel = "Number of Cross-validations", yLabel = "Count",
+                                         binwidth = 1)
+     print(head(sampleDistribution))
+     print(head(featureDistribution))
+   }
Loading required package: curatedOvarianData
Loading required package: affy
Loading required package: sparsediscrim
TCGA.04.1337 TCGA.23.1032 TCGA.23.1107 TCGA.04.1343 TCGA.24.0970 TCGA.04.1335 
   0.0000000    0.0000000    0.0000000    0.0000000    0.0000000    0.3333333 
 ABCC3  ABCF1   ABT1 ACOT11 ACOT13 ADAM17 
     1      1      1      1      1      1 
Warning message:
Removed 13 rows containing non-finite values (stat_density). 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>