R: Combining multiple laser scans of microarray data
multiscan
R Documentation
Combining multiple laser scans of microarray data
Description
Estimates gene expressions from multiple laser scans of microarrays using non-linear
functional regression model with additive plus multiplicative errors.
A numeric matrix or data frame containing the intensity data
of a single microarray scanned at multiple (two or more) scanner settings. For
dual channel arrays, multiscan should be run on each channel of data separately.
The number of rows (n) is equal to the number of spots/probes on the array,
and the number of columns (m) equals the number of scans. Columns will be
arranged in order of scanner's sensitivity before fitting the model. Replicated
probes on the array are treated as individual spots.
initial
A vector of length m+2 to be used as initial values for
the scanning effects (β_2, cdots, β_m) and scale
(σ_1, σ_2, ν) parameters. If it is NULL (default), the
initial values are determined from the data.
na.rm
Logical. Should missing values be removed? Defaults to TRUE.
verbose
Logical. If TRUE, some intermediate results are printed
as the iteration proceeds.
control
A list of control parameters. See ‘Details’.
Details
The function implements the method of Khondoker et. al. (2006)
for combining multiple laser scans of microarrays. This function is
computationally slow and memory-intensive. That is due to the nested
iteration loops of the numerical optimization of the likelihood
function involving a large number (n+m+2) of parameters. The
optimization uses an alternating algorithm with the Nelder-Mead
simplex method (Nelder and Mead, 1965) in the inner loops. The function multiscan
directly uses the C function nmmin, the internal code used in the
general-purpose optimization tool optim, for
implementing the Nelder-Mead simplex method. For large data sets with many tens of thousands of probes, it
is recommended to consider first fitting the model using a random
subset (e.g. 10,000 rows) of the data matrix, and then using the
estimated scanning effects and scale parameters obtained as initial
values for fitting the model to the full data set.
The control is a list of arguments. The users can change/supply any of the
following components:
trace
Indicator (0 or 1) of tracing information of
Nelder-Mead algorithm. If 1, tracing information on the progress of the
optimization is produced. Because Nelder-Mead may be callled thousands of times
during the estimation process, setting trace = 1 will print too much
information very rapidly, which may not be useful. Defaults to 0.
gmaxit
The maximum number of global iterations. Defaults to 150.
maxit
The maximum number of Nelder-Mead iterations.
Defaults to 5000.
reltol
Relative convergence tolerance of Nelder-Mead.
The algorithm stops if it is unable to reduce the value by a factor of
reltol * (abs(val) + reltol) at a step. Defaults to 1e-5.
globaltol
Convergence tolerance of the outer (alternating) iteration.
The estimation process converges if the gain in loglikelihood from one complete cycle
of the outer iteration is less than globaltol. Defaults to 1e-10.
alpha, beta, gamma
Scaling parameters
for the Nelder-Mead method. alpha is the reflection
factor (default 1.0), beta the contraction factor (0.5) and
gamma the expansion factor (2.0).
Value
Returns an object of class multiscan with components
call
The call of the multiscan function.
beta
A vector of length m containing the maximum likelihood estimates of
the scanning effects, the first component fixed at 1.
scale
A vector of length 3 containing the maximum likelihood estimates of
the scale parameters σ_1, σ_2, mbox{and} ν.
mu
A vector of length n containing the estimated gene expressions.
data
A matrix of the input data with columns rearranged in order of scanner's sensitivity.
fitted
A matrix of the fitted model on the data.
sdres
A matrix of the standardised residuals.
outerit
Number of global iterations completed.
gconv, conv, convmu
Integer convergence codes.
gconv
Indicator of global convergence. 0 indicates successful convergence,
1 indicates premature termination.
conv
Convergence codes for the Nelder-Mead simplex method in the last global iteration while
updating scanning effects and scale parameters. 0 for successful convergence,
1 indicates that the iteration limit maxit had been reached, 10
indicates degeneracy of the Nelder-Mead simplex method.
convmu
Convergence codes for the Nelder-Mead
simplex method in the last global iteration while updating
the gene expression parameters. This is an integer vector of length n where
each component takes the value 0, 1, or 10 depending on whether the
Nelder-Mead simplex method successfully converged, reached iteration limit maxit or produced
degeneracy respectively while updating the corresponding gene expression parameter.
outerit
Number of global iterations completed.
loglf
Value of the loglikelihood function at convergence (gconv=0). NA
if not converged (gconv=1).
References
Khondoker, M. R., Glasbey, C. A. and Worton, B. J. (2006).
Statistical estimation of gene expression using multiple laser scans of microarrays.
Bioinformatics22, 215–219.
Nelder, J. A. and Mead, R. (1965).
A simplex method for function minimization.
The Computer Journal7 308–313.
See Also
A web interface, created by David Nutter of Biomathematics & Statistics Scotland (BioSS), based on the original
Fortran code written by Khondoker et al. (2006) is available at
http://www.bioss.ac.uk/ktshowcase/create.cgi. Although it uses the same algorithm,
results from the web interface may not be exactly identical to that of
multiscan as it uses a different (non-free IMSL routine) implementation of Nelder-Mead simplex.
Examples
## load the multiscan library
library(multiscan)
## load the murine data set included in multiscan package
data(murine)
murine[1:10, ] ## see first few rows of data
## fit the model on murine data with default options
fit <- multiscan(murine)
fit
## plot the fitted model
plot(fit)
## get the estimated gene expressions
gene.exprs <- fit$mu
## see more details as iteration progresses
fit1 <- multiscan(murine, verbose = TRUE)
fit1
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(multiscan)
*** Loaded multiscan Version 1.32.0 ***
Type 'vignette("multiscan")' to view the package vignette.
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/multiscan/multiscan.Rd_%03d_medium.png", width=480, height=480)
> ### Name: multiscan
> ### Title: Combining multiple laser scans of microarray data
> ### Aliases: multiscan
> ### Keywords: nonlinear optimize
>
> ### ** Examples
>
>
> ## load the multiscan library
> library(multiscan)
>
> ## load the murine data set included in multiscan package
> data(murine)
> murine[1:10, ] ## see first few rows of data
scan1 scan2 scan3 scan4
1 1145.1677 1778.3423 3138.7314 4516.025
2 296.0000 472.0000 869.0000 1374.000
3 467.0000 692.0000 1179.8000 1992.000
4 795.4694 1247.3893 2176.6643 3329.654
5 2084.2793 3334.1765 6084.2651 10014.496
6 525.0000 688.6923 1169.9434 1825.786
7 266.2000 384.1875 732.1111 1072.600
8 495.9074 742.3148 1313.9796 2009.833
9 565.9101 895.0319 1616.3943 2495.142
10 1160.0336 1832.7114 3259.6541 5295.662
>
> ## fit the model on murine data with default options
> fit <- multiscan(murine)
Estimating 1000 gene expressions using 4 scans of data
------------------------------------------------------------
Log-likelihood at initial parameters: -21348.7187
------------------------------------------------------------
End of global iteration: 1, Log-likelihood: -19542.94736
End of global iteration: 2, Log-likelihood: -19373.08458
End of global iteration: 3, Log-likelihood: -19351.64478
End of global iteration: 4, Log-likelihood: -19349.44816
End of global iteration: 5, Log-likelihood: -19348.63585
End of global iteration: 6, Log-likelihood: -19348.63585
> fit
Call: multiscan(data = murine)
Scanning effects:
scan1 scan2 scan3 scan4
1.000000 1.566959 2.764289 4.328168
Scale parameters:
Additive Multiplicative Nu
5.241983306 0.006761393 0.390186429
Log-likelihood at convergence:
Loglf
-19348.64
>
> ## plot the fitted model
> plot(fit)
>
> ## get the estimated gene expressions
> gene.exprs <- fit$mu
>
> ## see more details as iteration progresses
>
> fit1 <- multiscan(murine, verbose = TRUE)
Estimating 1000 gene expressions using 4 scans of data
------------------------------------------------------------
Initial parameters:
------------------------------------------------------------
Scanning effects:
beta[1]= 1.00000000
beta[2]= 1.57472589
beta[3]= 2.76419289
beta[4]= 4.35577945
Scale parameters:
sigma1= 10.00000000
sigma2= 0.00500000
nu= 0.10000000
------------------------------------------------------------
Log-likelihood at initial parameters: -21348.7187
------------------------------------------------------------
************************************************************
Started iteration: 1
Updating the scanning effects and scale parameters... done.
Updating the gene expression parameters (mu)... done.
************************************************************
End of global iteration: 1, Log-likelihood: -19542.94736
Estimates of the scanning effects:
beta[1]= 1.00000000 (fixed)
beta[2]= 1.56923767
beta[3]= 2.75849770
beta[4]= 4.30677788
Estimates of the scale parameters:
sigma1= 8.86076246
sigma2= 0.01119905
nu= 0.37926299
------------------------------------------------------------
************************************************************
Started iteration: 2
Updating the scanning effects and scale parameters... done.
Updating the gene expression parameters (mu)... done.
************************************************************
End of global iteration: 2, Log-likelihood: -19373.08458
Estimates of the scanning effects:
beta[1]= 1.00000000 (fixed)
beta[2]= 1.56756697
beta[3]= 2.76341844
beta[4]= 4.32138078
Estimates of the scale parameters:
sigma1= 7.00509964
sigma2= 0.00718175
nu= 0.38451695
------------------------------------------------------------
************************************************************
Started iteration: 3
Updating the scanning effects and scale parameters... done.
Updating the gene expression parameters (mu)... done.
************************************************************
End of global iteration: 3, Log-likelihood: -19351.64478
Estimates of the scanning effects:
beta[1]= 1.00000000 (fixed)
beta[2]= 1.56710142
beta[3]= 2.76398053
beta[4]= 4.32477084
Estimates of the scale parameters:
sigma1= 5.73354388
sigma2= 0.00687417
nu= 0.38827366
------------------------------------------------------------
************************************************************
Started iteration: 4
Updating the scanning effects and scale parameters... done.
Updating the gene expression parameters (mu)... done.
************************************************************
End of global iteration: 4, Log-likelihood: -19349.44816
Estimates of the scanning effects:
beta[1]= 1.00000000 (fixed)
beta[2]= 1.56721243
beta[3]= 2.76403078
beta[4]= 4.32609783
Estimates of the scale parameters:
sigma1= 5.32621598
sigma2= 0.00684815
nu= 0.39053303
------------------------------------------------------------
************************************************************
Started iteration: 5
Updating the scanning effects and scale parameters... done.
Updating the gene expression parameters (mu)... done.
************************************************************
End of global iteration: 5, Log-likelihood: -19348.63585
Estimates of the scanning effects:
beta[1]= 1.00000000 (fixed)
beta[2]= 1.56695912
beta[3]= 2.76428876
beta[4]= 4.32816847
Estimates of the scale parameters:
sigma1= 5.24198331
sigma2= 0.00676139
nu= 0.39018643
------------------------------------------------------------
************************************************************
Started iteration: 6
Updating the scanning effects and scale parameters... done.
Updating the gene expression parameters (mu)... done.
************************************************************
End of global iteration: 6, Log-likelihood: -19348.63585
Estimates of the scanning effects:
beta[1]= 1.00000000 (fixed)
beta[2]= 1.56695912
beta[3]= 2.76428876
beta[4]= 4.32816847
Estimates of the scale parameters:
sigma1= 5.24198331
sigma2= 0.00676139
nu= 0.39018643
------------------------------------------------------------
> fit1
Call: multiscan(data = murine, verbose = TRUE)
Scanning effects:
scan1 scan2 scan3 scan4
1.000000 1.566959 2.764289 4.328168
Scale parameters:
Additive Multiplicative Nu
5.241983306 0.006761393 0.390186429
Log-likelihood at convergence:
Loglf
-19348.64
>
>
>
>
>
>
> dev.off()
null device
1
>