R: Detection of Differential Expression in a semi-supervised...
dexss
R Documentation
Detection of Differential Expression in a semi-supervised Setting
Description
Performs the DEXSS algorithm for detection of
differentially expressed genes in RNA-seq data for a
semi-supervised setting, i.e. that the condition of some
samples is known, and for some samples the condition is
unkown.
either a vector of counts or a raw data matrix,
where columns are interpreted as samples and rows as
genomic regions. An instance of "countDataSet" is also
accepted.
nclasses
The number of conditions, i.e. mixture
components. (Default = 2)
G
The weight of the prior distribution of the
mixture weights. Not used in the supervised case.
(Default = 1).
cyc
Positive integer that sets the number of
cycles of the EM algorithm. (Default = 20).
alphaInit
The initial estimates of the condition
sizes, i.e., mixture weights. Not used in the supervised
case. (Default = c(0.5,0.5)) .
labels
The labels for the classes, will be coerced
into an integer. For this semi-supervised version the
known labels/conditions must be coded as integers
starting with 1. The samples with the label 1 will be
considered as being in the "major condition". For the
samples with unknown labels/conditions an "NA" must be
set.
normalization
method used for normalizing the
reads. "RLE" is the method used by (Anders and Huber,
2010), "upperquartile" is the Upper-Quartile method by
(Bullard et al., 2010), and none deactivates
normalization. (Default = "RLE").
kmeansIter
number of times the K-Means algorithm
is run. (Default = 10).
ignoreIfAllCountsSmaller
Ignores transcript for
which all read counts are smaller than this value. These
transcripts are considered as "not expressed" (Default =
1).
theta
The weight of the prior on the size
parameter or inverse dispersion parameter. Theta is
adjusted to each transcript by dividing by the mean read
count of the transcript. The higher theta, the
lower r and the higher the overdispersion will be.
(Default = 2.5).
minMu
Minimal mean for all negative binomial
distributions. (Default = 0.5).
rmax
Maximal value for the size parameter. The
inverse of this parameter is the lower bound on the
dispersion. In analogy to (Anders and Huber, 2010) we use
13 as default. (Default = 13).
initialization
Method used to find the initial
clusters. Dexus can either use the quantiles of the
readcounts of each gene or run k-means on the counts.
(Default = "kmeans").
multiclassPhiPoolingFunction
In "multiClass" mode
the dispersion is either estimated across all classes at
once (NULL), or separately for each condition, i.e.,
class. The size parameters or dispersion per class are
then joined to one estimate by the mean ("mean"), minimum
("min") or maximum ("max"). In our investigations
estimation across all classes at once performed best.
(Default = NULL).
quiet
Logical that indicates whether dexus should
report the steps of the algorithm. Supresses messages
from the program if set to TRUE. (Default = FALSE).
resultObject
Type of the result object; can either
be a list ("list") or an instance of "DEXUSResult"
("S4"). (Default="S4").
Details
The read count x is explained by a finite mixture
of negative binomials:
p(x) = ∑_{i=1} ^n α_i mathrm{NB}(x;
μ_i, r_i),
where α_i is the weight of the mixture
component, mathrm{NB} is the negative binomial
with mean parameter μ_i and size parameter
r_i. The parameters are selected by an EM algorithm
in a Baysian framework.
Each component in the mixture model corresponds to one
condition.
If the groups, conditions, replicate
status, or labels are unknown, DEXUS tries to estimate
these conditions. For each transcript DEXUS tries to
explain the read counts by one negative binomial
distribution. If this is possible, the transcript is
explained by one condition and therefore it is not
differentially expressed. If more than one negative
binomial distribution is needed to explain the read
counts of a transcript, this transcript indicates that it
is differentially expressed. Evidence for differential
expression is strong if a large amount of samples
participate in each condition and the mean expression
values are well separated. Both of these criteria are
measured by the informative/non-informative (I/NI) call.
If there are more than two groups given by the
vector labels, DEXUS uses a generalized linear
model to explain the data in analogy to (McCarthy, 2012).
If there are two groups given by the vector
labels, DEXUS uses the exact test for count data to
test between the sample groups, as implemented by (Anders
and Huber, 2010) in the package "DESeq".
Value
"list" or "DEXUSResult". A list containing the results
and the parameters of the algorithm or an instance of
"DEXUSResult".
Anders, S. and Huber, W. (2010). Differential
expression analysis for sequence count data. Genome
Biol, 11(10), R106.
Bullard, J. H., Purdom, E., Hansen, K. D., and Dudoit, S.
(2010). Evaluation of statistical methods for
normalization and differential expression in mRNA-seq
experiments. BMC Bioinformatics, 11, 94.
McCarthy, D. J., Chen, Y., and Smyth, G. K. (2012).
Differential expression analysis of multifactor
RNA-Seq experiments with respect to biological
variation. Nucleic Acids Res, 40(10), 4288-4297.
Examples
data(dexus)
labels1 <- substr(colnames(countsBottomly),1,2)
labels2 <- c()
labels2[which(labels1=="D2")] <- 1
labels2[which(labels1=="B6")] <- 2
labels2[c(3,7,8,10,12,15)] <- NA
res <- dexss(countsBottomly[1:100, ],labels=labels2,nclasses=2,G=0)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(dexus)
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, cbind, colnames, do.call, duplicated, eval, evalq,
get, grep, grepl, intersect, is.unsorted, lapply, lengths, mapply,
match, mget, order, paste, pmax, pmax.int, pmin, pmin.int, rank,
rbind, rownames, sapply, setdiff, sort, table, tapply, union,
unique, unsplit
Attaching package: 'dexus'
The following object is masked from 'package:BiocGenerics':
sizeFactors
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/dexus/dexss.Rd_%03d_medium.png", width=480, height=480)
> ### Name: dexss
> ### Title: Detection of Differential Expression in a semi-supervised
> ### Setting
> ### Aliases: dexss DEXSS,
>
> ### ** Examples
>
> data(dexus)
> labels1 <- substr(colnames(countsBottomly),1,2)
> labels2 <- c()
> labels2[which(labels1=="D2")] <- 1
> labels2[which(labels1=="B6")] <- 2
> labels2[c(3,7,8,10,12,15)] <- NA
> res <- dexss(countsBottomly[1:100, ],labels=labels2,nclasses=2,G=0)
Filtered out 5 % of the genes due to low counts
Semi-supervised mode.
>
>
>
>
>
> dev.off()
null device
1
>