an object of class ExpressionSet; see Details for important
information on how the phenoData slot of this object will be
interpreted by the function.
covariate
integer, numeric or character; specifies
the covariate to be used to fit the PLGEM. See Details for how to
specify the covariate.
fitCondition
integer, numeric or character;
specifies the condition to be used to fit the PLGEM. See Details for
how to specify the fitCondition.
p
integer (or coercible to integer); number of intervals
used to partition the expression value range.
q
numeric in [0,1]; the quantile of standard deviation used for
PLGEM fitting.
trimAllZeroRows
logical; if TRUE, rows in the data set
containing only zero values are trimmed before fitting PLGEM.
zeroMeanOrSD
either NULL or character; what should be
done if a row with non-positive mean or zero standard deviation is
encountered before fitting PLGEM? Current options are one of
"replace" or "trim". Partial matching is used to switch
between the options and setting the value to NULL will cause the
default behaviour to be enforced, i.e. to "replace" (see Details).
fittingEval
logical; if TRUE, the fitting is evaluated
generating a diagnostic plot.
plot.file
logical; if TRUE, a png file is written on the
current working directory.
prefix
optional character to use as a prefix of the file name
to be written.
gPar
optional list of graphical parameters to define plotting
boundaries in PLGEM fitting evaluation plots. If left unspecified suitable
boundaries will be determined from the data. The recommended way to set
these parameters if via a call to setGPar().
verbose
logical; if TRUE, comments are printed out while
running.
Details
plgem.fit fits a Power Law Global Error Model (PLGEM) to an
ExpressionSet and optionally evaluates the quality of the fit. This
PLGEM aims to find the mathematical relationship between standard
deviation and mean gene expression values (or protein abundance levels) in a
set of replicated microarray (or proteomics) samples, according to the
following power law:
It has been demonstrated that this model fits to Affymetrix GeneChip datasets,
as well as to datasets of normalized spectral counts obtained by mass
spectrometry-based proteomics. Technically, two replicates are required and
sufficient to fit a PLGEM. Having 3 or more replicates, of course,
improves the fitting and is recommended (see References for details).
The phenoData slot of the ExpressionSet given as input is
expected to contain the necessary information to distinguish the various
experimental conditions from one another. The columns of the pData are
referred to as ‘covariates’. There has to be at least one covariate
defined in the input ExpressionSet. The sample attributes according to
this covariate must be distinct for samples that are to be treated as distinct
experimental conditions and identical for samples that are to be treated as
replicates.
There is a couple different ways to specify the covariate: If an
integer or a numeric is given, it will be taken as the covariate
number (in the same order in which the covariates appear in the
colnames of the pData). If a character is given, it will
be taken as the covariate name itself (in the same way the covariates are
specified in the colnames of the pData). By default, the first
covariate appearing in the colnames of the pData is used.
Similarly, there is a couple different ways to specify on which experimental
condition to fit the model. The available ‘condition names’ are taken
from unique(as.character(pData(data)[, covariate])). If
fitCondition is given as a character, it will be taken as the
condition name itself. If fitCondition is given as an integer
or a numeric value, it will be taken as the condition number (in the
same order of appearance as in the ‘condition names’). By default, the
first condition name is used.
Setting trimAllZeroRows=TRUE is especially useful in proteomics data
sets, where there is no guarantee of identifying a protein across all
experimental conditions. Since PLGEM is fitted only to the data
corresponding to a single experimental condition (as defined by
fitCondition), it is possible to generate a non-negligible number of
rows containing only zero values, even if there were no such rows in the
original (complete) data set containing all experimental conditions.
Setting zeroMeanOrSD="replace" (the current default, for backward
compatibility) will cause the function to replace zero or negative means with
the smallest positive mean found in the data set and to replace zero standard
deviations with the smallest non-zero standard deviation found in the data
set. Setting zeroMeanOrSD="trim" is the current recommended option,
especially for spectral counting proteomics data sets that are typically
characterized by a high data granularity or for microarray data sets with a
small number of replicates. In both cases, there are chances for data values
for a same gene or protein to be identical across replicates (and therefore
with zero standard deviation) by chance alone. Note that setting
trimAllZeroRows=TRUE does not guarantee that there will be no rows with
zero mean or zero standard deviation.
If argument fittingEval is set to TRUE, a graphical control of
the goodness of the PLGEM fitting is produced and a plot containing
four panels is generated. The top-left panel shows the power law,
characterized by a ‘SLOPE’ and an ‘INTERCEPT’. The top-right
panel represents the distribution of model residuals. The bottom-left reports
the contour plot of ranked residuals. The bottom-right panel finally shows the
relationship between the distribution of observed residuals and the normal
distribution. A good fit normally gives a horizontal symmetric rank-plot and a
near normal distribution of residuals.
Warnings are issued if the fitted PLGEM slope is above 1 or under 0.5, if the
adjusted r^2 is below 0.95 or if the Pearson correlation
coefficient is below 0.85. These are the ranges of values inside which most
GeneChip MAS5 dataset and NSAF proteomics dataset have been empirically
observed to lie (see References).
Value
A list of six elements (see Details):
SLOPE
the slope of the fitted PLGEM.
INTERCEPT
the intercept of the fitted PLGEM.
DATA.PEARSON
the Pearson correlation coefficient between the
log(sd) and the log(mean) in the
original data.
ADJ.R2.MP
the adjusted r^2 of PLGEM fitted on the
modelling points.
COVARIATE
a character indicating the covariate used for
fitting.
FIT.CONDITION
a character indicating the condition used for
fitting.
Pavelka N, Pelizzola M, Vizzardelli C, Capozzoli M, Splendiani A, Granucci F,
Ricciardi-Castagnoli P. A power law global error model for the identification
of differentially expressed genes in microarray data. BMC Bioinformatics. 2004
Dec 17; 5:203; http://www.biomedcentral.com/1471-2105/5/203.
Pavelka N, Fournier ML, Swanson SK, Pelizzola M, Ricciardi-Castagnoli P,
Florens L, Washburn MP. Statistical similarities between transcriptomics and
quantitative shotgun proteomics data. Mol Cell Proteomics. 2008 Apr;
7(4):631-44; http://www.mcponline.org/cgi/content/abstract/7/4/631.
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(plgem)
Welcome to plgem version 1.44.0
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/plgem/plgem.fit.Rd_%03d_medium.png", width=480, height=480)
> ### Name: plgem.fit
> ### Title: PLGEM Fitting and Evaluation
> ### Aliases: plgem.fit
> ### Keywords: models
>
> ### ** Examples
>
> data(LPSeset)
> LPSfit <- plgem.fit(data=LPSeset, fittingEval=TRUE)
> as.data.frame(LPSfit)
SLOPE INTERCEPT DATA.PEARSON ADJ.R2.MP COVARIATE FIT.CONDITION
1 0.7679941 -0.5349053 0.9302925 0.9868802 conditionName C
>
>
>
>
>
> dev.off()
null device
1
>