a numeric matrix or data frame containing the quality
measures (columns) for each array (rows). The number of rows must
exceed the number of columns.
method
The Mahalanobis Distances (MDs) can be computed on all
the quality measures in the QC report (this is the default method
given by method="nogroups"), on the first k principal components
resulting from a principal component analysis (PCA) of the QC report
("global") or on subsets of quality measures in the QC report
("apriori": groups defined by the user, "cluster":
groups resulting from a cluster analysis, or "loading":
groups resulting from a cluster analysis in the space of the
loadings of a PCA). While the first two methods compute a single MD
for each array, the last three compute one MD within each created
group of quality measures.
groups
A list to specify the groups of quality measures when
the “apriori” method is chosen. E.g. groups =
list(c(1,2), c(4,6)) puts column 1,2 as one group and 4,6 as a
second.
k
An integer to specify the number of clusters (or groups) to
be used in the cluster analysis when “cluster” or
“loading” methods are chosen.
pc
An integer to specify the number of principal components
analyzed from the PCA when “global” or “loading”
methods are chosen.
robust
A robust multivariate location/spread estimator (choice
of S-estimator, MCD or MVE). The default method uses S-estimators
with a 25% breakdown point.
nsamp
The number of subsamples that the robust estimator should
use. This defaults to 10 times the number of rows in the matrix.
Details
MDQC flags potentially low quality arrays based on the idea of
outlier detection, that is, it flags those arrays whose quality
attributes jointly depart from those of the bulk of the data.
This function computes a distance measure, the Mahalanobis Distance, to
summarize the quality of each array. The use of this distance allows us
to perform a multivariate analysis of the information in QC reports
taking the correlation structure of the quality measures into
account. In addition, by using robust estimators to identify the typical
quality measures of good-quality arrays, the evaluation is not affected
by the measures of outlying arrays.
MDQC can be based on all the quality measures simultaneously (using
method="nogroups"), on subsets of them (using method="apriori",
"cluster", or "loading"), or on a transformed space with a lower
dimension (using method="global").
In the “apriori” approach the user forms groups of quality
measures on the basis of an a priori interpretation of them and
according to the quality aspect they represent. The “cluster”
and the “loading” methods are two data-driven methods to form
the groups. The former groups the quality measures using clustering
analysis, and the latter uses the loadings of a principal component
analysis to identify the quality measures that contain similar
information and group them. It is important to note that the
“apriori”, the “cluster”, and the “loading”
methods create groups of the original quality measures of the report
and compute one MD within each group. Finally, the “global”
method computes a single MD based on the reduced space of the first k
principal components from a robust PCA. The number k of PCs can be
chosen using a scree plot.
More details on each method are given in Cohen Freue et
al. (2007)
Value
An object of class ‘“mdqc”’ (with associated plot, print
and summary methods) with components
ngroups
Number of groups in which the MDs have been computed
groups
column numbers corresponding to the quality measures
in each group
mdqcValues
Mahalanobis Distance(s) for each array
x
dataset containing the numeric quality measures in the
report
method
method used to group or transform the quality
measures before computing the MD for each array
pc
number of principal components used in the robust PCA.
k
number of clusters used in the cluster analysis.
Note
We thank Christopher Croux for providing us a MATLAB code that
we translated into R to compute the multivariate S-estimator
Cohen Freue, G. V. and Hollander, Z. and Shen, E. and Zamar, R. H. and Balshaw,
R. and Scherer, A. and McManus, B. and Keown, P. and McMaster, W. R. and Ng,
R. T. (2007) ‘MDQC: A New Quality Assessment Method for Microarrays
Based on Quality Control Reports’. Bioinformatics23, 3162 – 3169.
Bolstad, B. M. and Collin, F. and Brettschneider, J. and Simpson,
K. and Cope, L. and Irizarry R. A. and Speed T. P. (2005)
‘Quality assessment of Affymetrix GeneChip data.’ In Gentleman
R. and Carey C. J. and Huber W. and Irizarry R. A. and Dudoit
S. Bioinformatics and Computational Biology Solutions Using R
and Bioconductor. New York: Springer.
Brettschneider, J. and Collin, F. and Bolstad, B. M. and Speed,
T. P. (2007) ‘Quality assessment for short oligonucleotide
arrays’. Forthcoming in Technometrics (with Discussion).
Ross, M. E. and Zhou, X. and Song, G. and Shurtleff, S. A. and
Girtman, K. and Williams, W. K. and Liu, H. and Mahfouz, R. and
Raimondi, S. C. and Lenny, N. and Patel, A. and Downing, J. R. (2003)
‘Classification of pediatric acute lymphoblastic leukemia by
gene expression profiling.’ Blood102, 2951–9.
See Also
prcomp.robust,pam,
mahalanobis, allQC
Examples
data(allQC)
## Contains the QC report obtained using Bioconductor's simpleaffy package
## for a subset of arrays from a large acute lymphoblastic leukemia (ALL)
## study (Ross et al., 2004).
## This dataset has been also studied by Bolstad et al. (2005) and
## Brettschneider et al. (2007).
## For further information see allQC.
#### No Groups method
# Figure 2 in Cohen Freue et al. (2007):
# Results of MDQC based on all measures of the QC report.
mdout <- mdqc(allQC, method="nogroups")
plot(mdout)
print(mdout)
summary(mdout)
#### A-Priori grouping method
# Figure 3 in Cohen Freue et al. (2007):
# Results of MDQC using the apriori grouping method.
mdout <- mdqc(allQC, method="apriori", groups=list(1:5, 6:9, 10:11))
plot(mdout)
#### Global PCA method
# Figure 4 in Cohen Freue et al.(2007):
# Results of MDQC using the global PCA method.
mdout <- mdqc(allQC, method="global", pc=4)
plot(mdout)
#### Clustering grouping method
# Figure 4 in Supplementary Material of Cohen Freue et al. (2007):
# Results of MDQC using a cluster analysis to form
# 3 groups of quality measures.
mdout <- mdqc(allQC, method="cluster", k=3)
plot(mdout)
#### Loading grouping method
# Figure 4 in Supplementary Material of Cohen Freue et al. (2007):
# Results of MDQC using a cluster analysis on the first
# k=4 loading vectors from a robust PCA to form 3 groups of quality measures.
mdout <- mdqc(allQC, method="loading", k=3, pc=4)
plot(mdout)
### To get the raw MD distances
mdout$mdqcValues
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(mdqc)
Loading required package: cluster
Loading required package: MASS
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/mdqc/mdqc.Rd_%03d_medium.png", width=480, height=480)
> ### Name: mdqc
> ### Title: MDQC: Mahalanobis Distance Quality Control
> ### Aliases: mdqc
> ### Keywords: multivariate robust
>
> ### ** Examples
>
>
> data(allQC)
>
> ## Contains the QC report obtained using Bioconductor's simpleaffy package
> ## for a subset of arrays from a large acute lymphoblastic leukemia (ALL)
> ## study (Ross et al., 2004).
> ## This dataset has been also studied by Bolstad et al. (2005) and
> ## Brettschneider et al. (2007).
> ## For further information see allQC.
>
>
> #### No Groups method
> # Figure 2 in Cohen Freue et al. (2007):
> # Results of MDQC based on all measures of the QC report.
>
> mdout <- mdqc(allQC, method="nogroups")
> plot(mdout)
> print(mdout)
Method used: nogroups Number of groups: 1
Robust estimator: S-estimatorMDs exceeding the square root of the 90 % percentile of the Chi-Square distribution
[1] 14
MDs exceeding the square root of the 95 % percentile of the Chi-Square distribution
[1] 14
MDs exceeding the square root of the 99 % percentile of the Chi-Square distribution
[1] 14
> summary(mdout)
Summary information for MDQC
Method used: nogroups Number of groups: 1
Robust estimator: S-estimator
Number of Outliers:
90% 95% 99%
1 1 1
>
> #### A-Priori grouping method
> # Figure 3 in Cohen Freue et al. (2007):
> # Results of MDQC using the apriori grouping method.
>
> mdout <- mdqc(allQC, method="apriori", groups=list(1:5, 6:9, 10:11))
> plot(mdout)
>
>
>
> #### Global PCA method
> # Figure 4 in Cohen Freue et al.(2007):
> # Results of MDQC using the global PCA method.
>
> mdout <- mdqc(allQC, method="global", pc=4)
> plot(mdout)
>
>
>
> #### Clustering grouping method
> # Figure 4 in Supplementary Material of Cohen Freue et al. (2007):
> # Results of MDQC using a cluster analysis to form
> # 3 groups of quality measures.
>
> mdout <- mdqc(allQC, method="cluster", k=3)
> plot(mdout)
>
>
>
> #### Loading grouping method
> # Figure 4 in Supplementary Material of Cohen Freue et al. (2007):
> # Results of MDQC using a cluster analysis on the first
> # k=4 loading vectors from a robust PCA to form 3 groups of quality measures.
>
> mdout <- mdqc(allQC, method="loading", k=3, pc=4)
> plot(mdout)
>
>
> ### To get the raw MD distances
> mdout$mdqcValues
[[1]]
1 2 3 4 5 6 7 8
7.6796147 2.2565453 0.9795014 1.4558570 3.4247972 2.7419927 3.0046593 2.4208556
9 10 11 12 13 14 15 16
1.3227690 2.4973483 1.8700956 1.8075629 0.9349888 3.1422048 1.2908303 1.4503559
17 18 19 20
1.2954040 1.1910367 1.3982293 1.6849591
[[2]]
1 2 3 4 5 6 7
1.8095065 44.4432296 1.7464526 1.5386682 0.8740362 1.3619236 2.4851231
8 9 10 11 12 13 14
1.7965680 1.0796099 1.3208350 0.9531571 1.0418052 1.0104145 8.8914435
15 16 17 18 19 20
1.0439949 1.5001193 2.7424233 1.6414073 1.2745152 1.9123399
[[3]]
1 2 3 4 5 6 7 8
1.0099809 1.1397212 1.2983621 1.5489876 1.6059610 0.6346995 1.4331583 3.8710987
9 10 11 12 13 14 15 16
1.3254090 0.5527318 1.6826578 1.6617291 1.5734826 0.6768468 0.8935631 1.3280052
17 18 19 20
1.0901687 0.6900555 0.4507220 1.6492200
>
>
>
>
>
>
> dev.off()
null device
1
>