Last data update: 2014.03.03

R: Detection of Differential Expression in an Unsupervised...
dexusR Documentation

Detection of Differential Expression in an Unsupervised Setting


Performs the DEXUS algorithm for detection of differentially expressed genes in RNA-seq data for a) unknown conditions, b) multiple known conditions, and c) two known conditions.


  dexus(X, nclasses = 2, alphaInit, G = 1, cyc = 20,
    labels = NULL, normalization = "RLE", kmeansIter = 10,
    ignoreIfAllCountsSmaller = 1, theta = 2.5, minMu = 0.5,
    rmax = 13, initialization = "kmeans",
    multiclassPhiPoolingFunction = NULL, quiet = FALSE,
    resultObject = "S4")



either a vector of counts or a raw data matrix, where columns are interpreted as samples and rows as genomic regions. An instance of "countDataSet" is also accepted.


The number of conditions, i.e. mixture components. (Default = 2)


The initial estimates of the condition sizes, i.e., mixture weights. Not used in the supervised case. (Default = c(0.5,0.5)) .


The weight of the prior distribution of the mixture weights. Not used in the supervised case. (Default = 1).


Positive integer that sets the number of cycles of the EM algorithm. (Default = 20).


labels for the classes, will be coerced into a factor by as.factor. Can either be a factor, character or integer. If this vector is given, supervised detection is used. If this vector is set to NULL the unsupervised detection is performed. (Default=NULL).


method used for normalizing the reads. "RLE" is the method used by (Anders and Huber, 2010), "upperquartile" is the Upper-Quartile method by (Bullard et al., 2010), and none deactivates normalization. (Default = "RLE").


number of times the K-Means algorithm is run. (Default = 10).


Ignores transcript for which all read counts are smaller than this value. These transcripts are considered as "not expressed" (Default = 1).


The weight of the prior on the size parameter or inverse dispersion parameter. Theta is adjusted to each transcript by dividing by the mean read count of the transcript. The higher theta, the lower r and the higher the overdispersion will be. (Default = 2.5).


Minimal mean for all negative binomial distributions. (Default = 0.5).


Maximal value for the size parameter. The inverse of this parameter is the lower bound on the dispersion. In analogy to (Anders and Huber, 2010) we use 13 as default. (Default = 13).


Method used to find the initial clusters. Dexus can either use the quantiles of the readcounts of each gene or run k-means on the counts. (Default = "kmeans").


In "multiClass" mode the dispersion is either estimated across all classes at once (NULL), or separately for each condition, i.e., class. The size parameters or dispersion per class are then joined to one estimate by the mean ("mean"), minimum ("min") or maximum ("max"). In our investigations estimation across all classes at once performed best. (Default = NULL).


Logical that indicates whether dexus should report the steps of the algorithm. Supresses messages from the program if set to TRUE. (Default = FALSE).


Type of the result object; can either be a list ("list") or an instance of "DEXUSResult" ("S4"). (Default="S4").


The read count x is explained by a finite mixture of negative binomials:

p(x) = ∑_{i=1} ^n α_i mathrm{NB}(x; μ_i, r_i),

where α_i is the weight of the mixture component, mathrm{NB} is the negative binomial with mean parameter μ_i and size parameter r_i. The parameters are selected by an EM algorithm in a Baysian framework.

Each component in the mixture model corresponds to one condition.

  • If the groups, conditions, replicate status, or labels are unknown, DEXUS tries to estimate these conditions. For each transcript DEXUS tries to explain the read counts by one negative binomial distribution. If this is possible, the transcript is explained by one condition and therefore it is not differentially expressed. If more than one negative binomial distribution is needed to explain the read counts of a transcript, this transcript indicates that it is differentially expressed. Evidence for differential expression is strong if a large amount of samples participate in each condition and the mean expression values are well separated. Both of these criteria are measured by the informative/non-informative (I/NI) call.

  • If there are more than two groups given by the vector labels, DEXUS uses a generalized linear model to explain the data in analogy to (McCarthy, 2012).

  • If there are two groups given by the vector labels, DEXUS uses the exact test for count data to test between the sample groups, as implemented by (Anders and Huber, 2010) in the package "DESeq".


"list" or "DEXUSResult". A list containing the results and the parameters of the algorithm or an instance of "DEXUSResult".


Guenter Klambauer and Thomas Unterthiner


result <- dexus(countsMontgomery[1:10, ])


> data(dexus)
> result <- dexus(countsMontgomery[1:10, ])
Filtered out 10 % of the genes due to low counts
Unsupervised mode.
null device 