R Graphical Manual

Browse All

Last data update: 2014.03.03

R: Large-Scale Optimization

spg	R Documentation

Large-Scale Optimization

Description

Spectral projected gradient method for large-scale optimization with simple constraints.

Usage

     spg(par, fn, gr=NULL, method=3, lower=-Inf, upper=Inf, 
           project=NULL, projectArgs=NULL, 
	   control=list(), quiet=FALSE, alertConvergence=TRUE, ...)

Arguments

`par`	A real vector argument to `fn`, indicating the initial guess for the optimization of nonlinear objective function `fn`.
`fn`	Nonlinear objective function that is to be optimized. A scalar function that takes a real vector as argument and returns a scalar that is the value of the function at that point (see details).
`gr`	The gradient of the objective function `fn` evaluated at the argument. This is a vector-function that takes a real vector as argument and returns a real vector of the same length. It defaults to "NULL", which means that gradient is evaluated numerically. Computations are dramatically faster in high-dimensional problems when the exact gradient is provided. See Example.
`method`	An integer (1, 2, or 3) specifying which Barzilai-Borwein steplength to use. The default is 3. See Details.
`upper`	An upper bound for box constraints.
`lower`	An lower bound for box constraints.
`project`	A projection function or character string indicating its name. The projection function takes a point in R^n and projects it onto a region that defines the constraints of the problem. This is a vector-function that takes a real vector as argument and returns a real vector of the same length. See Details. If a projection function is not supplied, arguments `lower` and `upper` will cause the use of an internally defined function that enforces the implied box constraints.
`projectArgs`	A list with arguments to the `project` function. See Details.
`control`	A list of control parameters. See Details.
`quiet`	A logical variable (TRUE/FALSE). If `TRUE` warnings and some additional information printing are suppressed. Default is `quiet = FALSE` Note that the `control` variable `trace` and `quiet` affect different printing, so if `trace` is not set to `FALSE` there will be considerable printed output.
`alertConvergence`	A logical variable. With the default `TRUE` a warning is issued if convergence is not obtained. When set to `FALSE` the warning is suppressed.
`...`	Additional arguments passed to `fn` and `gr`. (Both must accept any specified arguments, either explicitly or by having a ... argument, but they do not need to use them all.)

Details

R adaptation, with significant modifications, by Ravi Varadhan, Johns Hopkins University (March 25, 2008), from the original FORTRAN code of Birgin, Martinez, and Raydan (2001). The original is available at the TANGO project http://www.ime.usp.br/~egbirgin/tango/downloads.php

A major modification in our R adaptation of the original FORTRAN code is the availability of 3 different options for Barzilai-Borwein (BB) steplengths: method = 1 is the BB steplength used in Birgin, Martinez and Raydan (2000); method = 2 is the other steplength proposed in Barzilai and Borwein's (1988) original paper. Finally, method = 3, is a new steplength, which was first proposed in Varadhan and Roland (2008) for accelerating the EM algorithm. In fact, Varadhan and Roland (2008) considered 3 similar steplength schemes in their EM acceleration work. Here, we have chosen method = 3 as the "default" method. This method may be slightly slower than the other 2 BB steplength schemes, but it generally exhibited more reliable convergence to a better optimum in our experiments. We recommend that the user try the other steplength schemes if the default method does not perform well in their problem.

Box constraints can be imposed by vectors lower and upper. Scalar values for lower and upper are expanded to apply to all parameters. The default lower is -Inf and upper is +Inf, which imply no constraints.

The project argument provides a way to implement more general constraints to be imposed on the parameters in spg. projectArgs is passed to the project function if one is specified. The first argument of any project function should be par and any other arguments should be passed using its argument projectArgs. To avoid confusion it is suggested that user defined project functions should not use arguments lower and upper.

The function projectLinear incorporates linear equalities and inequalities. This function also provides an example of how other projections might be implemented.

Argument control is a list specifing any changes to default values of algorithm control parameters. Note that the names of these must be specified completely. Partial matching will not work. The list items are as follows:

M: A positive integer, typically between 5-20, that controls the monotonicity of the algorithm. M=1 would enforce strict monotonicity in the reduction of L2-norm of fn, whereas larger values allow for more non-monotonicity. Global convergence under non-monotonicity is ensured by enforcing the Grippo-Lampariello-Lucidi condition (Grippo et al. 1986) in a non-monotone line-search algorithm. Values of M between 5 to 20 are generally good. The default is M = 10.
maxit: The maximum number of iterations. The default is maxit = 1500.
ftol: Convergence tolerance on the absolute change in objective function between successive iterations. Convergence is declared when the change is less than ftol. Default is ftol = 1.e-10.
gtol: Convergence tolerance on the infinity-norm of projected gradient gr evaluated at the current parameter. Convergence is declared when the infinity-norm of projected gradient is less than gtol. Default is gtol = 1.e-05.
maxfeval: Maximum limit on the number of function evaluations. Default is maxfeval = 10000.
maximize: A logical variable indicating whether the objective function is to be maximized. Default is maximize = FALSE indicating minimization. For maximization (e.g. log-likelihood maximization in statistical modeling), this may be set to TRUE.
trace: A logical variable (TRUE/FALSE). If TRUE, information on the progress of optimization is printed. Default is trace = TRUE.
triter: An integer that controls the frequency of tracing when trace=TRUE. Default is triter=10, which means that the objective fn and the infinity-norm of its projected gradient are printed at every 10-th iteration.
eps: A small positive increment used in the finite-difference approximation of gradient. Default is 1.e-07.
checkGrad: NULL or a logical variable TRUE/FALSE indicating whether to check the provided analytical gradient against a numerical approximation. With the default NULL the gradient is checked if it is estimated to take less than about ten seconds. A warning will be issued in the case it takes longer. The default can be overridden by specifying TRUE or FALSE. It is recommended that this be set to FALSE for high-dimensional problems, after making sure that the gradient is correctly specified, possibly by running once with TRUE specified.
checkGrad.tol: A small positive value use to compare the maximum relative difference between a user supplied gradient gr and the numerical approximation calculated by grad from package numDeriv. The default is 1.e-06. If this value is exceeded then an error message is issued, as it is a reasonable indication of a problem with the user supplied gr. The user can either fix the gr function, remove it so the finite-difference approximation is used, or increase the tolerance so the check passes.

Value

A list with the following components:

`par`	Parameters that optimize the nonlinear objective function, if convergence is successful.
`value`	The value of the objective function at termination.
`gradient`	L-infinity norm of the projected gradient of the objective function at termination. If convergence is successful, this should be less than `gtol`.
`fn.reduction`	Reduction in the value of the function from its initial value. This is negative in maximization.
`iter`	Number of iterations taken by the algorithm. The gradient is evaluated once each iteration, so the number of gradient evaluations will also be equal to `iter`, plus any evaluations necessary for `checkGrad`.
`feval`	Number of times the objective `fn` was evaluated.
`convergence`	An integer code indicating type of convergence. `0` indicates successful convergence, in which case the projcted gradient is smaller than `pgtol` or the change in objective function is smaller than `ftol`. Error codes are: `1` indicates that the maximum limit for iterations `maxit` has been reached. `2` indicates that maximum limit on function evals has been exceeded. `3` indicates failure due to error in function evaluation. `4` indicates failure due to error in gradient evaluation. `5` indicates failure due to error in projection.
`message`	A text message explaining which termination criterion was used.

References

Birgin EG, Martinez JM, and Raydan M (2000): Nonmonotone spectral projected gradient methods on convex sets, SIAM J Optimization, 10, 1196-1211.

Birgin EG, Martinez JM, and Raydan M (2001): SPG: software for convex-constrained optimization, ACM Transactions on Mathematical Software.

L Grippo, F Lampariello, and S Lucidi (1986), A nonmonotone line search technique for Newton's method, SIAM J on Numerical Analysis, 23, 707-716.

M Raydan (1997), Barzilai-Borwein gradient method for large-scale unconstrained minimization problem, SIAM J of Optimization, 7, 26-33.

R Varadhan and C Roland (2008), Simple and globally-convergent methods for accelerating the convergence of any EM algorithm, Scandinavian J Statistics, doi: 10.1111/j.1467-9469.2007.00585.x.

R Varadhan and PD Gilbert (2009), BB: An R Package for Solving a Large System of Nonlinear Equations and for Optimizing a High-Dimensional Nonlinear Objective Function, J. Statistical Software, 32:4, http://www.jstatsoft.org/v32/i04/

Examples

sc2.f <- function(x){ sum((1:length(x)) * (exp(x) - x)) / 10}

sc2.g <- function(x){ (1:length(x)) * (exp(x) - 1) / 10}

p0 <- rnorm(50)
ans.spg1 <- spg(par=p0, fn=sc2.f)  # Default is method=3
ans.spg2 <- spg(par=p0, fn=sc2.f, method=1)
ans.spg3 <- spg(par=p0, fn=sc2.f, method=2)
ans.cg <- optim(par=p0, fn=sc2.f, method="CG")  #Uses conjugate-gradient method in "optim"
ans.lbfgs <- optim(par=p0, fn=sc2.f, method="L-BFGS-B")  #Uses low-memory BFGS method in "optim"

# Now we use exact gradient.  
# Computation is much faster compared to when using numerical gradient.
ans.spg1 <- spg(par=p0, fn=sc2.f, gr=sc2.g)

############
# Another example illustrating use of additional parameters to objective function 
valley.f <- function(x, cons) {
  n <- length(x)
  f <- rep(NA, n)
  j <- 3 * (1:(n/3))
  jm2 <- j - 2
  jm1 <- j - 1
  f[jm2] <- (cons[2] * x[jm2]^3 + cons[1] * x[jm2]) * exp(-(x[jm2]^2)/100) - 1
  f[jm1] <- 10 * (sin(x[jm2]) - x[jm1])
  f[j] <- 10 * (cos(x[jm2]) - x[j])
  sum(f*f)
  }

k <- c(1.003344481605351, -3.344481605351171e-03)
p0 <- rnorm(30)  # number of parameters should be a multiple of 3 for this example
ans.spg2 <- spg(par=p0, fn=valley.f, cons=k, method=2)  
ans.cg <- optim(par=p0, fn=valley.f, cons=k, method="CG")  
ans.lbfgs <- optim(par=p0, fn=valley.f, cons=k, method="L-BFGS-B")  

####################################################################
# Here is a statistical example illustrating log-likelihood maximization.

poissmix.loglik <- function(p,y) {
  # Log-likelihood for a binary Poisson mixture
  i <- 0:(length(y)-1)
  loglik <- y*log(p[1]*exp(-p[2])*p[2]^i/exp(lgamma(i+1)) + 
        (1 - p[1])*exp(-p[3])*p[3]^i/exp(lgamma(i+1)))
  return (sum(loglik) )
  }

# Data from Hasselblad (JASA 1969)
poissmix.dat <- data.frame(death=0:9, freq=c(162,267,271,185,111,61,27,8,3,1))
y <- poissmix.dat$freq

# Lower and upper bounds on parameters
lo <- c(0.001,0,0)  
hi <- c(0.999, Inf, Inf)

p0 <- runif(3,c(0.2,1,1),c(0.8,5,8))  # randomly generated starting values

ans.spg <- spg(par=p0, fn=poissmix.loglik, y=y, lower=lo, upper=hi, 
     control=list(maximize=TRUE))

# how to compute hessian at the MLE
  require(numDeriv)
  hess <- hessian(x=ans.spg$par, poissmix.loglik, y=y)
  se <- sqrt(-diag(solve(hess)))  # approximate standard errors

Results


R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> library(BB)
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/BB/spg.Rd_%03d_medium.png", width=480, height=480)
> ### Name: spg
> ### Title: Large-Scale Optimization
> ### Aliases: spg
> ### Keywords: multivariate
> 
> ### ** Examples
> 
> sc2.f <- function(x){ sum((1:length(x)) * (exp(x) - x)) / 10}
> 
> sc2.g <- function(x){ (1:length(x)) * (exp(x) - 1) / 10}
> 
> p0 <- rnorm(50)
> ans.spg1 <- spg(par=p0, fn=sc2.f)  # Default is method=3
iter:  0  f-value:  197.9292  pgrad:  30.8266 
iter:  10  f-value:  127.6343  pgrad:  0.1627386 
iter:  20  f-value:  127.5014  pgrad:  0.01786518 
iter:  30  f-value:  127.5001  pgrad:  0.00403702 
iter:  40  f-value:  127.5  pgrad:  0.0009215739 
iter:  50  f-value:  127.5  pgrad:  0.0008783729 
iter:  60  f-value:  127.5  pgrad:  3.922196e-05 
> ans.spg2 <- spg(par=p0, fn=sc2.f, method=1)
iter:  0  f-value:  197.9292  pgrad:  30.8266 
iter:  10  f-value:  127.5745  pgrad:  0.1291622 
iter:  20  f-value:  127.5011  pgrad:  0.04761233 
iter:  30  f-value:  127.5004  pgrad:  0.03254783 
iter:  40  f-value:  127.5  pgrad:  0.000794671 
iter:  50  f-value:  127.5  pgrad:  0.002499689 
> ans.spg3 <- spg(par=p0, fn=sc2.f, method=2)
iter:  0  f-value:  197.9292  pgrad:  30.8266 
iter:  10  f-value:  127.6612  pgrad:  0.1728854 
iter:  20  f-value:  127.5045  pgrad:  0.1653243 
iter:  30  f-value:  127.5002  pgrad:  0.007004957 
iter:  40  f-value:  127.5  pgrad:  0.01086661 
iter:  50  f-value:  127.5  pgrad:  0.000352145 
iter:  60  f-value:  127.5  pgrad:  0.0003304024 
iter:  70  f-value:  127.5  pgrad:  4.220624e-05 
> ans.cg <- optim(par=p0, fn=sc2.f, method="CG")  #Uses conjugate-gradient method in "optim"
> ans.lbfgs <- optim(par=p0, fn=sc2.f, method="L-BFGS-B")  #Uses low-memory BFGS method in "optim"
> 
> # Now we use exact gradient.  
> # Computation is much faster compared to when using numerical gradient.
> ans.spg1 <- spg(par=p0, fn=sc2.f, gr=sc2.g)
iter:  0  f-value:  197.9292  pgrad:  30.8266 
iter:  10  f-value:  127.6343  pgrad:  0.1627385 
iter:  20  f-value:  127.5014  pgrad:  0.01786465 
iter:  30  f-value:  127.5001  pgrad:  0.004031603 
iter:  40  f-value:  127.5  pgrad:  0.0009264037 
iter:  50  f-value:  127.5  pgrad:  0.0008680818 
iter:  60  f-value:  127.5  pgrad:  3.642787e-05 
> 
> ############
> # Another example illustrating use of additional parameters to objective function 
> valley.f <- function(x, cons) {
+   n <- length(x)
+   f <- rep(NA, n)
+   j <- 3 * (1:(n/3))
+   jm2 <- j - 2
+   jm1 <- j - 1
+   f[jm2] <- (cons[2] * x[jm2]^3 + cons[1] * x[jm2]) * exp(-(x[jm2]^2)/100) - 1
+   f[jm1] <- 10 * (sin(x[jm2]) - x[jm1])
+   f[j] <- 10 * (cos(x[jm2]) - x[j])
+   sum(f*f)
+   }
> 
> k <- c(1.003344481605351, -3.344481605351171e-03)
> p0 <- rnorm(30)  # number of parameters should be a multiple of 3 for this example
> ans.spg2 <- spg(par=p0, fn=valley.f, cons=k, method=2)  
iter:  0  f-value:  2699.036  pgrad:  414.4792 
iter:  10  f-value:  14.6236  pgrad:  2.238182 
iter:  20  f-value:  11.47981  pgrad:  1.606474 
iter:  30  f-value:  5.651816  pgrad:  3.957706 
iter:  40  f-value:  4.567452  pgrad:  1.042303 
iter:  50  f-value:  2.246198  pgrad:  1.423417 
iter:  60  f-value:  0.9855292  pgrad:  0.4715191 
iter:  70  f-value:  0.4940861  pgrad:  0.9706623 
iter:  80  f-value:  0.138232  pgrad:  0.4169235 
iter:  90  f-value:  0.0286886  pgrad:  0.07679862 
iter:  100  f-value:  0.01005746  pgrad:  0.9391996 
iter:  110  f-value:  1.072785e-05  pgrad:  0.001472615 
> ans.cg <- optim(par=p0, fn=valley.f, cons=k, method="CG")  
> ans.lbfgs <- optim(par=p0, fn=valley.f, cons=k, method="L-BFGS-B")  
> 
> ####################################################################
> # Here is a statistical example illustrating log-likelihood maximization.
> 
> poissmix.loglik <- function(p,y) {
+   # Log-likelihood for a binary Poisson mixture
+   i <- 0:(length(y)-1)
+   loglik <- y*log(p[1]*exp(-p[2])*p[2]^i/exp(lgamma(i+1)) + 
+         (1 - p[1])*exp(-p[3])*p[3]^i/exp(lgamma(i+1)))
+   return (sum(loglik) )
+   }
> 
> # Data from Hasselblad (JASA 1969)
> poissmix.dat <- data.frame(death=0:9, freq=c(162,267,271,185,111,61,27,8,3,1))
> y <- poissmix.dat$freq
> 
> # Lower and upper bounds on parameters
> lo <- c(0.001,0,0)  
> hi <- c(0.999, Inf, Inf)
> 
> p0 <- runif(3,c(0.2,1,1),c(0.8,5,8))  # randomly generated starting values
> 
> ans.spg <- spg(par=p0, fn=poissmix.loglik, y=y, lower=lo, upper=hi, 
+      control=list(maximize=TRUE))
iter:  0  f-value:  -3122.275  pgrad:  7.68762 
iter:  10  f-value:  -1992.479  pgrad:  3.522377 
iter:  20  f-value:  -1993.036  pgrad:  3.378979 
iter:  30  f-value:  -1991.025  pgrad:  10.2916 
iter:  40  f-value:  -1990.284  pgrad:  3.520242 
iter:  50  f-value:  -1990.027  pgrad:  0.7321978 
iter:  60  f-value:  -1989.958  pgrad:  0.2921524 
iter:  70  f-value:  -1989.95  pgrad:  0.7366793 
iter:  80  f-value:  -1989.946  pgrad:  0.2869433 
iter:  90  f-value:  -1989.946  pgrad:  0.05401944 
iter:  100  f-value:  -1989.946  pgrad:  0.05123638 
iter:  110  f-value:  -1989.946  pgrad:  0.0068394 
iter:  120  f-value:  -1989.946  pgrad:  0.002016805 
> 
> # how to compute hessian at the MLE
>   require(numDeriv)
Loading required package: numDeriv
>   hess <- hessian(x=ans.spg$par, poissmix.loglik, y=y)
>   se <- sqrt(-diag(solve(hess)))  # approximate standard errors
> 
> 
> 
> 
> 
> 
> 
> dev.off()
null device 
          1 
>