Last data update: 2014.03.03

R: Calculate GenesRanking
calculateGenesRankingR Documentation

Calculate GenesRanking

Description

Calculates the genes ranking and/or plots the posterior probability of the genes ordered by class ranking.

Usage

calculateGenesRanking(eset=NULL, sampleLabels=NULL, 
numGenesPlot=1000, plotTitle="Significant genes", plotLp=TRUE, 
lpThreshold = 0.95, numSignificantGenesType="ranked", 
returnRanking="full", nullHiphothesisFilter=0.95,  nGenesExprDiff=1000, 
geneLabels=NULL, precalcGenesRanking=NULL, IQRfilterPercentage= 0, 
verbose=TRUE)

Arguments

eset

ExpressionSet or Matrix. Gene expression of the train samples (positive & non-logaritmic normalized values).

sampleLabels

Character. PhenoData variable (column name) containing the train samples class labels.
Matrix or Factor. Class labels of the train samples.

numGenesPlot

Integer. Number of genes to plot.

plotTitle

Character. Plot title.

plotLp

Logical. If FALSE no plot is drawn.

lpThreshold

Numeric between 0 and 1. Required posterior probability value to consider a gene 'significant'.

numSignificantGenesType

Character. Type of count for number of genes over lpThreshold.

  • "global". Counts all genes of a class with posterior probability over lpThreshold, even if in the final ranking they were assigned to another class.

  • "ranked". Counts only genes assigned to each class.

returnRanking

Character. Type of ranking to return:

  • "full". Ranking of all available genes.

  • "lp"/"significant"/"lpThreshold"/TRUE. Ranking of the significant genes (genes with posterior probability over lpThreshold).

  • FALSE/NULL. No ranking is returned.

nullHiphothesisFilter

Numeric between 0 and 1. Genes with a Null Hipothesis with a posterior probability over this threshold will be removed from the ranking.
Null Hipothesis: They don't represent any class.

nGenesExprDiff

Numeric. Number of top genes to calculate the differencial expression for.

geneLabels

Vector or Matrix. Gene name, ID or label which should be shown in the returned results and plots.

IQRfilterPercentage

Integer. InterQuartile Range (IQR) filter applied to the initial data. Not recommended for more than two classes.

precalcGenesRanking

Allows providing a genesRanking provided by geNetClassifier or by a previous execution for the same data and parameters.

verbose

Logical. If TRUE, messages indicating the execution progress will be printed on screen.

Details

Significant genes: Genes with posterior probability over 'lpThreshold'.
More significant genes may mean:

  • Very different class

  • More systemic disease

Plot lines represet the posterior probability of genes, sorted by rank from left to right.

In order to find genes that diferentiate the classes from each other, the function ranks the genes bassed on their posterior probability for each class.
The posterior probability represents how well a gene differentiates samples from a class, from samples from other classes. Therefore, Genes with high posterior probability are good to differentiate a class from all the others.
This posterior probability is calculated by emfit (pkg:EBarrays), an expectation-maximization (EM) algorithm for gene expression mixture model.

Value

  • GenesRanking Optional. Requested genes ranking.

  • Plot Optional. Plot of the posterior probability of the top genes.

See Also

plot.GenesRanking is a shortcut to plotting a previusly calculated genes ranking.
i.e. plot(genesRanking)

Examples


# Load an expressionSet:
library(leukemiasEset)
data(leukemiasEset)

# Select the train samples: 
trainSamples<- c(1:10, 13:22, 25:34, 37:46, 49:58) 
# summary(leukemiasEset$LeukemiaType[trainSamples])

## Not run: 
######
# Calculate/plot the significant genes (+ info) of a dataset 
# without training classifier/calculating network
######
# Return only significant genes ranking (default)
signGenesRanking <- calculateGenesRanking(leukemiasEset[,trainSamples], 
    sampleLabels="LeukemiaType")
numGenes(signGenesRanking)

# Return the full genes ranking:
fullRanking <- calculateGenesRanking(leukemiasEset[,trainSamples], 
    sampleLabels="LeukemiaType", returnRanking="full")
numGenes(fullRanking)
numSignificantGenes(fullRanking)
# The significant genes can then be extracted from it:
signGenesRanking2  <- getTopRanking(fullRanking, 
    numGenesClass=numSignificantGenes(fullRanking))
numGenes(signGenesRanking2)

# Changing the posterior probability required to consider genes significant:
signGenesRanking90 <- calculateGenesRanking(leukemiasEset[,trainSamples], 
    sampleLabels="LeukemiaType", lpThreshold=0.9)
numGenes(signGenesRanking90)

## End(Not run)
######
# Ploting previously calculated rankings:
######
# Load or calculate a ranking (or a classifier with geNetClassifier)
data(leukemiasClassifier) # Sample trained classifier, @genesRanking

# Default plot:
# - equivalent to plot(leukemiasClassifier@genesRanking)
# - in this case, the previously calculated 'fullRanking' 
#   is equivalent to 'leukemiasClassifier@genesRanking'
calculateGenesRanking(precalcGenesRanking=leukemiasClassifier@genesRanking)

# Changing arguments:
calculateGenesRanking(precalcGenesRanking=leukemiasClassifier@genesRanking, 
    numGenesPlot=5000, plotTitle="Leukemias", lpThreshold=0.9)


Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(geNetClassifier)
Loading required package: Biobase
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
    get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
    match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
    rbind, rownames, sapply, setdiff, sort, table, tapply, union,
    unique, unsplit

Welcome to Bioconductor

    Vignettes contain introductory material; view with
    'browseVignettes()'. To cite Bioconductor, see
    'citation("Biobase")', and for packages 'citation("pkgname")'.

Loading required package: EBarrays
Loading required package: lattice
Loading required package: minet
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/geNetClassifier/calculateGenesRanking.Rd_%03d_medium.png", width=480, height=480)
> ### Name: calculateGenesRanking
> ### Title: Calculate GenesRanking
> ### Aliases: calculateGenesRanking
> ### Keywords: classif
> 
> ### ** Examples
> 
> 
> # Load an expressionSet:
> library(leukemiasEset)
> data(leukemiasEset)
> 
> # Select the train samples: 
> trainSamples<- c(1:10, 13:22, 25:34, 37:46, 49:58) 
> # summary(leukemiasEset$LeukemiaType[trainSamples])
> 
> ## Not run: 
> ##D ######
> ##D # Calculate/plot the significant genes (+ info) of a dataset 
> ##D # without training classifier/calculating network
> ##D ######
> ##D # Return only significant genes ranking (default)
> ##D signGenesRanking <- calculateGenesRanking(leukemiasEset[,trainSamples], 
> ##D     sampleLabels="LeukemiaType")
> ##D numGenes(signGenesRanking)
> ##D 
> ##D # Return the full genes ranking:
> ##D fullRanking <- calculateGenesRanking(leukemiasEset[,trainSamples], 
> ##D     sampleLabels="LeukemiaType", returnRanking="full")
> ##D numGenes(fullRanking)
> ##D numSignificantGenes(fullRanking)
> ##D # The significant genes can then be extracted from it:
> ##D signGenesRanking2  <- getTopRanking(fullRanking, 
> ##D     numGenesClass=numSignificantGenes(fullRanking))
> ##D numGenes(signGenesRanking2)
> ##D 
> ##D # Changing the posterior probability required to consider genes significant:
> ##D signGenesRanking90 <- calculateGenesRanking(leukemiasEset[,trainSamples], 
> ##D     sampleLabels="LeukemiaType", lpThreshold=0.9)
> ##D numGenes(signGenesRanking90)
> ## End(Not run)
> ######
> # Ploting previously calculated rankings:
> ######
> # Load or calculate a ranking (or a classifier with geNetClassifier)
> data(leukemiasClassifier) # Sample trained classifier, @genesRanking
> 
> # Default plot:
> # - equivalent to plot(leukemiasClassifier@genesRanking)
> # - in this case, the previously calculated 'fullRanking' 
> #   is equivalent to 'leukemiasClassifier@genesRanking'
> calculateGenesRanking(precalcGenesRanking=leukemiasClassifier@genesRanking)
> 
> # Changing arguments:
> calculateGenesRanking(precalcGenesRanking=leukemiasClassifier@genesRanking, 
+     numGenesPlot=5000, plotTitle="Leukemias", lpThreshold=0.9)
> 
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>