R: Detection of Differential Expression in an Unsupervised...
dexus
R Documentation
Detection of Differential Expression in an Unsupervised Setting
Description
Performs the DEXUS algorithm for detection of
differentially expressed genes in RNA-seq data for a)
unknown conditions, b) multiple known conditions, and c)
two known conditions.
either a vector of counts or a raw data matrix,
where columns are interpreted as samples and rows as
genomic regions. An instance of "countDataSet" is also
accepted.
nclasses
The number of conditions, i.e. mixture
components. (Default = 2)
alphaInit
The initial estimates of the condition
sizes, i.e., mixture weights. Not used in the supervised
case. (Default = c(0.5,0.5)) .
G
The weight of the prior distribution of the
mixture weights. Not used in the supervised case.
(Default = 1).
cyc
Positive integer that sets the number of
cycles of the EM algorithm. (Default = 20).
labels
labels for the classes, will be coerced
into a factor by as.factor. Can either be a
factor, character or integer. If this vector is given,
supervised detection is used. If this vector is set to
NULL the unsupervised detection is performed.
(Default=NULL).
normalization
method used for normalizing the
reads. "RLE" is the method used by (Anders and Huber,
2010), "upperquartile" is the Upper-Quartile method by
(Bullard et al., 2010), and none deactivates
normalization. (Default = "RLE").
kmeansIter
number of times the K-Means algorithm
is run. (Default = 10).
ignoreIfAllCountsSmaller
Ignores transcript for
which all read counts are smaller than this value. These
transcripts are considered as "not expressed" (Default =
1).
theta
The weight of the prior on the size
parameter or inverse dispersion parameter. Theta is
adjusted to each transcript by dividing by the mean read
count of the transcript. The higher theta, the
lower r and the higher the overdispersion will be.
(Default = 2.5).
minMu
Minimal mean for all negative binomial
distributions. (Default = 0.5).
rmax
Maximal value for the size parameter. The
inverse of this parameter is the lower bound on the
dispersion. In analogy to (Anders and Huber, 2010) we use
13 as default. (Default = 13).
initialization
Method used to find the initial
clusters. Dexus can either use the quantiles of the
readcounts of each gene or run k-means on the counts.
(Default = "kmeans").
multiclassPhiPoolingFunction
In "multiClass" mode
the dispersion is either estimated across all classes at
once (NULL), or separately for each condition, i.e.,
class. The size parameters or dispersion per class are
then joined to one estimate by the mean ("mean"), minimum
("min") or maximum ("max"). In our investigations
estimation across all classes at once performed best.
(Default = NULL).
quiet
Logical that indicates whether dexus should
report the steps of the algorithm. Supresses messages
from the program if set to TRUE. (Default = FALSE).
resultObject
Type of the result object; can either
be a list ("list") or an instance of "DEXUSResult"
("S4"). (Default="S4").
Details
The read count x is explained by a finite mixture
of negative binomials:
p(x) = ∑_{i=1} ^n α_i mathrm{NB}(x;
μ_i, r_i),
where α_i is the weight of the mixture
component, mathrm{NB} is the negative binomial
with mean parameter μ_i and size parameter
r_i. The parameters are selected by an EM algorithm
in a Baysian framework.
Each component in the mixture model corresponds to one
condition.
If the groups, conditions, replicate
status, or labels are unknown, DEXUS tries to estimate
these conditions. For each transcript DEXUS tries to
explain the read counts by one negative binomial
distribution. If this is possible, the transcript is
explained by one condition and therefore it is not
differentially expressed. If more than one negative
binomial distribution is needed to explain the read
counts of a transcript, this transcript indicates that it
is differentially expressed. Evidence for differential
expression is strong if a large amount of samples
participate in each condition and the mean expression
values are well separated. Both of these criteria are
measured by the informative/non-informative (I/NI) call.
If there are more than two groups given by the
vector labels, DEXUS uses a generalized linear
model to explain the data in analogy to (McCarthy, 2012).
If there are two groups given by the vector
labels, DEXUS uses the exact test for count data to
test between the sample groups, as implemented by (Anders
and Huber, 2010) in the package "DESeq".
Value
"list" or "DEXUSResult". A list containing the results
and the parameters of the algorithm or an instance of
"DEXUSResult".
Anders, S. and Huber, W. (2010). Differential
expression analysis for sequence count data. Genome
Biol, 11(10), R106.
Bullard, J. H., Purdom, E., Hansen, K. D., and Dudoit, S.
(2010). Evaluation of statistical methods for
normalization and differential expression in mRNA-seq
experiments. BMC Bioinformatics, 11, 94.
McCarthy, D. J., Chen, Y., and Smyth, G. K. (2012).
Differential expression analysis of multifactor
RNA-Seq experiments with respect to biological
variation. Nucleic Acids Res, 40(10), 4288-4297.
Examples
data(dexus)
result <- dexus(countsMontgomery[1:10, ])
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(dexus)
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Attaching package: 'dexus'
The following object is masked from 'package:BiocGenerics':
sizeFactors
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/dexus/dexus.Rd_%03d_medium.png", width=480, height=480)
> ### Name: dexus
> ### Title: Detection of Differential Expression in an Unsupervised Setting
> ### Aliases: dexus DEXUS,
>
> ### ** Examples
>
> data(dexus)
> result <- dexus(countsMontgomery[1:10, ])
Filtered out 10 % of the genes due to low counts
Unsupervised mode.
>
>
>
>
>
> dev.off()
null device
1
>