Last data update: 2014.03.03

R: Combining multiple laser scans of microarray data
multiscanR Documentation

Combining multiple laser scans of microarray data

Description

Estimates gene expressions from multiple laser scans of microarrays using non-linear functional regression model with additive plus multiplicative errors.

Usage

multiscan(data, initial = NULL, na.rm = TRUE, verbose = FALSE, control = list())

Arguments

data

A numeric matrix or data frame containing the intensity data of a single microarray scanned at multiple (two or more) scanner settings. For dual channel arrays, multiscan should be run on each channel of data separately. The number of rows (n) is equal to the number of spots/probes on the array, and the number of columns (m) equals the number of scans. Columns will be arranged in order of scanner's sensitivity before fitting the model. Replicated probes on the array are treated as individual spots.

initial

A vector of length m+2 to be used as initial values for the scanning effects (β_2, cdots, β_m) and scale (σ_1, σ_2, ν) parameters. If it is NULL (default), the initial values are determined from the data.

na.rm

Logical. Should missing values be removed? Defaults to TRUE.

verbose

Logical. If TRUE, some intermediate results are printed as the iteration proceeds.

control

A list of control parameters. See ‘Details’.

Details

The function implements the method of Khondoker et. al. (2006) for combining multiple laser scans of microarrays. This function is computationally slow and memory-intensive. That is due to the nested iteration loops of the numerical optimization of the likelihood function involving a large number (n+m+2) of parameters. The optimization uses an alternating algorithm with the Nelder-Mead simplex method (Nelder and Mead, 1965) in the inner loops. The function multiscan directly uses the C function nmmin, the internal code used in the general-purpose optimization tool optim, for implementing the Nelder-Mead simplex method. For large data sets with many tens of thousands of probes, it is recommended to consider first fitting the model using a random subset (e.g. 10,000 rows) of the data matrix, and then using the estimated scanning effects and scale parameters obtained as initial values for fitting the model to the full data set.

The control is a list of arguments. The users can change/supply any of the following components:

trace

Indicator (0 or 1) of tracing information of Nelder-Mead algorithm. If 1, tracing information on the progress of the optimization is produced. Because Nelder-Mead may be callled thousands of times during the estimation process, setting trace = 1 will print too much information very rapidly, which may not be useful. Defaults to 0.

gmaxit

The maximum number of global iterations. Defaults to 150.

maxit

The maximum number of Nelder-Mead iterations. Defaults to 5000.

reltol

Relative convergence tolerance of Nelder-Mead. The algorithm stops if it is unable to reduce the value by a factor of reltol * (abs(val) + reltol) at a step. Defaults to 1e-5.

globaltol

Convergence tolerance of the outer (alternating) iteration. The estimation process converges if the gain in loglikelihood from one complete cycle of the outer iteration is less than globaltol. Defaults to 1e-10.

alpha, beta, gamma

Scaling parameters for the Nelder-Mead method. alpha is the reflection factor (default 1.0), beta the contraction factor (0.5) and gamma the expansion factor (2.0).

Value

Returns an object of class multiscan with components

call

The call of the multiscan function.

beta

A vector of length m containing the maximum likelihood estimates of the scanning effects, the first component fixed at 1.

scale

A vector of length 3 containing the maximum likelihood estimates of the scale parameters σ_1, σ_2, mbox{and} ν.

mu

A vector of length n containing the estimated gene expressions.

data

A matrix of the input data with columns rearranged in order of scanner's sensitivity.

fitted

A matrix of the fitted model on the data.

sdres

A matrix of the standardised residuals.

outerit

Number of global iterations completed.

gconv, conv, convmu

Integer convergence codes.

gconv

Indicator of global convergence. 0 indicates successful convergence, 1 indicates premature termination.

conv

Convergence codes for the Nelder-Mead simplex method in the last global iteration while updating scanning effects and scale parameters. 0 for successful convergence, 1 indicates that the iteration limit maxit had been reached, 10 indicates degeneracy of the Nelder-Mead simplex method.

convmu

Convergence codes for the Nelder-Mead simplex method in the last global iteration while updating the gene expression parameters. This is an integer vector of length n where each component takes the value 0, 1, or 10 depending on whether the Nelder-Mead simplex method successfully converged, reached iteration limit maxit or produced degeneracy respectively while updating the corresponding gene expression parameter.

outerit

Number of global iterations completed.

loglf

Value of the loglikelihood function at convergence (gconv=0). NA if not converged (gconv=1).

References

Khondoker, M. R., Glasbey, C. A. and Worton, B. J. (2006). Statistical estimation of gene expression using multiple laser scans of microarrays. Bioinformatics 22, 215–219.

Nelder, J. A. and Mead, R. (1965). A simplex method for function minimization. The Computer Journal 7 308–313.

See Also

A web interface, created by David Nutter of Biomathematics & Statistics Scotland (BioSS), based on the original Fortran code written by Khondoker et al. (2006) is available at http://www.bioss.ac.uk/ktshowcase/create.cgi. Although it uses the same algorithm, results from the web interface may not be exactly identical to that of multiscan as it uses a different (non-free IMSL routine) implementation of Nelder-Mead simplex.

Examples


## load the multiscan library 
library(multiscan)

## load the murine data set included in multiscan package
data(murine)
murine[1:10, ] ## see first few rows of data

## fit the model on murine data with default options
fit <- multiscan(murine)
fit

## plot the fitted model
plot(fit)

## get the estimated gene expressions
gene.exprs <- fit$mu

## see more details as iteration progresses

fit1 <- multiscan(murine, verbose = TRUE)
fit1

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(multiscan)
*** Loaded multiscan Version 1.32.0 *** 
Type 'vignette("multiscan")' to view the package vignette.
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/multiscan/multiscan.Rd_%03d_medium.png", width=480, height=480)
> ### Name: multiscan
> ### Title: Combining multiple laser scans of microarray data
> ### Aliases: multiscan
> ### Keywords: nonlinear optimize
> 
> ### ** Examples
> 
> 
> ## load the multiscan library 
> library(multiscan)
> 
> ## load the murine data set included in multiscan package
> data(murine)
> murine[1:10, ] ## see first few rows of data
       scan1     scan2     scan3     scan4
1  1145.1677 1778.3423 3138.7314  4516.025
2   296.0000  472.0000  869.0000  1374.000
3   467.0000  692.0000 1179.8000  1992.000
4   795.4694 1247.3893 2176.6643  3329.654
5  2084.2793 3334.1765 6084.2651 10014.496
6   525.0000  688.6923 1169.9434  1825.786
7   266.2000  384.1875  732.1111  1072.600
8   495.9074  742.3148 1313.9796  2009.833
9   565.9101  895.0319 1616.3943  2495.142
10 1160.0336 1832.7114 3259.6541  5295.662
> 
> ## fit the model on murine data with default options
> fit <- multiscan(murine)

Estimating 1000 gene expressions using 4 scans of data
------------------------------------------------------------
Log-likelihood at initial parameters:  -21348.7187
------------------------------------------------------------
End of global iteration: 1, Log-likelihood:   -19542.94736
End of global iteration: 2, Log-likelihood:   -19373.08458
End of global iteration: 3, Log-likelihood:   -19351.64478
End of global iteration: 4, Log-likelihood:   -19349.44816
End of global iteration: 5, Log-likelihood:   -19348.63585
End of global iteration: 6, Log-likelihood:   -19348.63585
> fit

Call:  multiscan(data = murine) 

Scanning effects:
   scan1      scan2      scan3      scan4   
1.000000   1.566959   2.764289   4.328168   

Scale parameters:
      Additive   Multiplicative               Nu   
   5.241983306      0.006761393      0.390186429   

Log-likelihood at convergence:
    Loglf 
-19348.64 
> 
> ## plot the fitted model
> plot(fit)
> 
> ## get the estimated gene expressions
> gene.exprs <- fit$mu
> 
> ## see more details as iteration progresses
> 
> fit1 <- multiscan(murine, verbose = TRUE)

Estimating 1000 gene expressions using 4 scans of data
------------------------------------------------------------
Initial parameters:
------------------------------------------------------------
Scanning effects:
beta[1]=    1.00000000
beta[2]=    1.57472589
beta[3]=    2.76419289
beta[4]=    4.35577945

Scale parameters:
sigma1=   10.00000000
sigma2=    0.00500000
    nu=    0.10000000
------------------------------------------------------------
Log-likelihood at initial parameters:  -21348.7187
------------------------------------------------------------
************************************************************
Started iteration:     1
Updating the scanning effects and scale parameters... done.

Updating the gene expression parameters (mu)... done.
************************************************************
End of global iteration: 1, Log-likelihood:   -19542.94736
Estimates of the scanning effects:
beta[1]=    1.00000000 (fixed)
beta[2]=    1.56923767
beta[3]=    2.75849770
beta[4]=    4.30677788

Estimates of the scale parameters:
sigma1=    8.86076246
sigma2=    0.01119905
    nu=    0.37926299
------------------------------------------------------------
************************************************************
Started iteration:     2
Updating the scanning effects and scale parameters... done.

Updating the gene expression parameters (mu)... done.
************************************************************
End of global iteration: 2, Log-likelihood:   -19373.08458
Estimates of the scanning effects:
beta[1]=    1.00000000 (fixed)
beta[2]=    1.56756697
beta[3]=    2.76341844
beta[4]=    4.32138078

Estimates of the scale parameters:
sigma1=    7.00509964
sigma2=    0.00718175
    nu=    0.38451695
------------------------------------------------------------
************************************************************
Started iteration:     3
Updating the scanning effects and scale parameters... done.

Updating the gene expression parameters (mu)... done.
************************************************************
End of global iteration: 3, Log-likelihood:   -19351.64478
Estimates of the scanning effects:
beta[1]=    1.00000000 (fixed)
beta[2]=    1.56710142
beta[3]=    2.76398053
beta[4]=    4.32477084

Estimates of the scale parameters:
sigma1=    5.73354388
sigma2=    0.00687417
    nu=    0.38827366
------------------------------------------------------------
************************************************************
Started iteration:     4
Updating the scanning effects and scale parameters... done.

Updating the gene expression parameters (mu)... done.
************************************************************
End of global iteration: 4, Log-likelihood:   -19349.44816
Estimates of the scanning effects:
beta[1]=    1.00000000 (fixed)
beta[2]=    1.56721243
beta[3]=    2.76403078
beta[4]=    4.32609783

Estimates of the scale parameters:
sigma1=    5.32621598
sigma2=    0.00684815
    nu=    0.39053303
------------------------------------------------------------
************************************************************
Started iteration:     5
Updating the scanning effects and scale parameters... done.

Updating the gene expression parameters (mu)... done.
************************************************************
End of global iteration: 5, Log-likelihood:   -19348.63585
Estimates of the scanning effects:
beta[1]=    1.00000000 (fixed)
beta[2]=    1.56695912
beta[3]=    2.76428876
beta[4]=    4.32816847

Estimates of the scale parameters:
sigma1=    5.24198331
sigma2=    0.00676139
    nu=    0.39018643
------------------------------------------------------------
************************************************************
Started iteration:     6
Updating the scanning effects and scale parameters... done.

Updating the gene expression parameters (mu)... done.
************************************************************
End of global iteration: 6, Log-likelihood:   -19348.63585
Estimates of the scanning effects:
beta[1]=    1.00000000 (fixed)
beta[2]=    1.56695912
beta[3]=    2.76428876
beta[4]=    4.32816847

Estimates of the scale parameters:
sigma1=    5.24198331
sigma2=    0.00676139
    nu=    0.39018643
------------------------------------------------------------
> fit1

Call:  multiscan(data = murine, verbose = TRUE) 

Scanning effects:
   scan1      scan2      scan3      scan4   
1.000000   1.566959   2.764289   4.328168   

Scale parameters:
      Additive   Multiplicative               Nu   
   5.241983306      0.006761393      0.390186429   

Log-likelihood at convergence:
    Loglf 
-19348.64 
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>