A real vector argument to fn, indicating the initial guess for the optimization of nonlinear
objective function fn.
fn
Nonlinear objective function that is to be optimized.
A scalar function that takes a real vector as argument and
returns a scalar that is the value of the function at that point
(see details).
gr
The gradient of the objective function fn evaluated at the
argument. This is a vector-function that takes a real
vector as argument and returns a real vector of the same length.
It defaults to "NULL", which means that gradient is evaluated numerically.
Computations are dramatically faster in high-dimensional problems when
the exact gradient is provided. See *Example*.
method
An integer (1, 2, or 3) specifying which Barzilai-Borwein
steplength to use. The default is 3. See *Details*.
upper
An upper bound for box constraints.
lower
An lower bound for box constraints.
project
A projection
function or character string indicating its name. The projection
function takes a point in R^n and
projects it onto a region that defines the constraints of the problem.
This is a vector-function that takes a real vector as argument and
returns a real vector of the same length. See *Details*.
If a projection function is not supplied, arguments lower and
upper will cause the use of an internally defined function that
enforces the implied box constraints.
projectArgs
A list with arguments to the project
function. See *Details*.
control
A list of control parameters. See *Details*.
quiet
A logical variable (TRUE/FALSE). If TRUE warnings
and some additional information printing are suppressed.
Default is quiet = FALSE
Note that the control variable trace and quiet
affect different printing, so if trace is not set to FALSE
there will be considerable printed output.
alertConvergence
A logical variable. With the default TRUE
a warning is issued if convergence is not obtained. When set to FALSE
the warning is suppressed.
...
Additional arguments passed to fn and gr.
(Both must accept any specified arguments, either
explicitly or by having a ... argument, but
they do not need to use them all.)
Details
R adaptation, with significant modifications, by Ravi Varadhan, Johns
Hopkins University (March 25, 2008), from the original FORTRAN code of
Birgin, Martinez, and Raydan (2001). The original is available at the
TANGO project http://www.ime.usp.br/~egbirgin/tango/downloads.php
A major modification in our R adaptation of the original FORTRAN code is the
availability of 3 different options for Barzilai-Borwein (BB)
steplengths: method = 1 is the BB
steplength used in Birgin, Martinez and Raydan (2000); method = 2 is
the other steplength proposed in Barzilai and Borwein's (1988) original paper.
Finally, method = 3, is a new steplength, which was first proposed in
Varadhan and Roland (2008) for accelerating the EM algorithm.
In fact, Varadhan and Roland (2008) considered 3 similar steplength schemes in
their EM acceleration work. Here, we have chosen method = 3
as the "default" method. This method may be slightly slower than the
other 2 BB steplength schemes, but it generally exhibited more reliable
convergence to a better optimum in our experiments.
We recommend that the user try the other steplength schemes if the default
method does not perform well in their problem.
Box constraints can be imposed by vectors lower and upper.
Scalar values for lower and upper are expanded to apply to
all parameters. The default lower is -Inf and upper
is +Inf, which imply no constraints.
The project argument provides a way to implement more general constraints
to be imposed on the parameters in spg. projectArgs is passed
to the project function if one is specified. The first argument of any project function should be par and any other arguments should be passed using its argument projectArgs.
To avoid confusion it is suggested that user defined project
functions should not use arguments lower and upper.
The function projectLinear incorporates linear equalities and
inequalities. This function also provides an example of how other projections
might be implemented.
Argument control is a list specifing any changes to default values of
algorithm control parameters. Note that the names of these must be
specified completely. Partial matching will not work.
The list items are as follows:
M
A positive integer, typically between 5-20, that controls the monotonicity of the algorithm. M=1 would enforce strict monotonicity
in the reduction of L2-norm of fn, whereas larger values allow for more non-monotonicity. Global convergence under non-monotonicity is ensured by
enforcing the Grippo-Lampariello-Lucidi condition (Grippo et al. 1986) in a non-monotone line-search algorithm. Values of M
between 5 to 20 are generally good. The default is M = 10.
maxit
The maximum number of iterations. The default is maxit = 1500.
ftol
Convergence tolerance on the absolute change in objective function between successive iterations.
Convergence is declared when the change is less than ftol. Default is ftol = 1.e-10.
gtol
Convergence tolerance on the infinity-norm of projected gradient gr evaluated at the current parameter.
Convergence is declared when the infinity-norm of projected gradient is less
than gtol. Default is gtol = 1.e-05.
maxfeval
Maximum limit on the number of function evaluations. Default is maxfeval = 10000.
maximize
A logical variable indicating whether the objective function is to be maximized. Default is maximize = FALSE indicating
minimization. For maximization (e.g. log-likelihood maximization in statistical modeling), this may be set to TRUE.
trace
A logical variable (TRUE/FALSE). If TRUE, information on
the progress of optimization is printed. Default is trace = TRUE.
triter
An integer that controls the frequency of tracing
when trace=TRUE. Default is triter=10, which means that
the objective fn and the infinity-norm of its projected gradient are
printed at every 10-th iteration.
eps
A small positive increment used in the finite-difference
approximation of gradient. Default is 1.e-07.
checkGrad
NULL or a logical variable TRUE/FALSE
indicating whether to
check the provided analytical gradient against a numerical approximation.
With the default NULL the gradient is checked if it is estimated to take
less than about ten seconds. A warning will be issued in the case it takes
longer. The default can be overridden by specifying TRUE or FALSE.
It is recommended that this be set to FALSE for high-dimensional problems,
after making sure that the gradient is correctly specified, possibly by running
once with TRUE specified.
checkGrad.tol
A small positive value use to compare the maximum relative
difference between a user supplied gradient gr and the numerical approximation
calculated by grad from package numDeriv. The default is 1.e-06.
If this value is exceeded then an error message is issued, as it is a
reasonable indication of a problem with the user supplied gr. The user can
either fix the gr function, remove it so the finite-difference
approximation is used, or increase the tolerance so the check passes.
Value
A list with the following components:
par
Parameters that optimize the nonlinear objective function,
if convergence is successful.
value
The value of the objective function at termination.
gradient
L-infinity norm of the projected gradient of the objective function at termination. If convergence is successful, this should be less than gtol.
fn.reduction
Reduction in the value of the function from its initial value. This is negative in maximization.
iter
Number of iterations taken by the algorithm. The gradient is evaluated once each iteration, so the number of gradient evaluations will also be equal to iter, plus any evaluations necessary for checkGrad.
feval
Number of times the objective fn was evaluated.
convergence
An integer code indicating type of convergence. 0
indicates successful convergence, in which case the projcted gradient is smaller
than pgtol or the change in objective function is smaller than ftol. Error codes are: 1 indicates that the maximum limit
for iterations maxit has been reached. 2 indicates that
maximum limit on function evals has been exceeded. 3 indicates
failure due to error in function evaluation. 4 indicates failure due
to error in gradient evaluation. 5 indicates failure due to error in
projection.
message
A text message explaining which termination criterion was used.
References
Birgin EG, Martinez JM, and Raydan M (2000): Nonmonotone spectral projected gradient methods on convex sets, SIAM J Optimization, 10, 1196-1211.
Birgin EG, Martinez JM, and Raydan M (2001): SPG: software for convex-constrained optimization, ACM Transactions on Mathematical Software.
L Grippo, F Lampariello, and S Lucidi (1986), A nonmonotone line search technique for Newton's method, SIAM J on Numerical Analysis, 23, 707-716.
M Raydan (1997), Barzilai-Borwein gradient method for large-scale unconstrained minimization problem, SIAM J of Optimization, 7, 26-33.
R Varadhan and C Roland (2008), Simple and globally-convergent methods for accelerating the convergence of any EM algorithm, Scandinavian J Statistics, doi: 10.1111/j.1467-9469.2007.00585.x.
R Varadhan and PD Gilbert (2009), BB: An R Package for Solving a Large System of Nonlinear Equations and for Optimizing a High-Dimensional Nonlinear Objective Function, J. Statistical Software, 32:4, http://www.jstatsoft.org/v32/i04/
See Also
projectLinear,
BBoptim,
optim,
nlm,
sane,
dfsane,
grad
Examples
sc2.f <- function(x){ sum((1:length(x)) * (exp(x) - x)) / 10}
sc2.g <- function(x){ (1:length(x)) * (exp(x) - 1) / 10}
p0 <- rnorm(50)
ans.spg1 <- spg(par=p0, fn=sc2.f) # Default is method=3
ans.spg2 <- spg(par=p0, fn=sc2.f, method=1)
ans.spg3 <- spg(par=p0, fn=sc2.f, method=2)
ans.cg <- optim(par=p0, fn=sc2.f, method="CG") #Uses conjugate-gradient method in "optim"
ans.lbfgs <- optim(par=p0, fn=sc2.f, method="L-BFGS-B") #Uses low-memory BFGS method in "optim"
# Now we use exact gradient.
# Computation is much faster compared to when using numerical gradient.
ans.spg1 <- spg(par=p0, fn=sc2.f, gr=sc2.g)
############
# Another example illustrating use of additional parameters to objective function
valley.f <- function(x, cons) {
n <- length(x)
f <- rep(NA, n)
j <- 3 * (1:(n/3))
jm2 <- j - 2
jm1 <- j - 1
f[jm2] <- (cons[2] * x[jm2]^3 + cons[1] * x[jm2]) * exp(-(x[jm2]^2)/100) - 1
f[jm1] <- 10 * (sin(x[jm2]) - x[jm1])
f[j] <- 10 * (cos(x[jm2]) - x[j])
sum(f*f)
}
k <- c(1.003344481605351, -3.344481605351171e-03)
p0 <- rnorm(30) # number of parameters should be a multiple of 3 for this example
ans.spg2 <- spg(par=p0, fn=valley.f, cons=k, method=2)
ans.cg <- optim(par=p0, fn=valley.f, cons=k, method="CG")
ans.lbfgs <- optim(par=p0, fn=valley.f, cons=k, method="L-BFGS-B")
####################################################################
# Here is a statistical example illustrating log-likelihood maximization.
poissmix.loglik <- function(p,y) {
# Log-likelihood for a binary Poisson mixture
i <- 0:(length(y)-1)
loglik <- y*log(p[1]*exp(-p[2])*p[2]^i/exp(lgamma(i+1)) +
(1 - p[1])*exp(-p[3])*p[3]^i/exp(lgamma(i+1)))
return (sum(loglik) )
}
# Data from Hasselblad (JASA 1969)
poissmix.dat <- data.frame(death=0:9, freq=c(162,267,271,185,111,61,27,8,3,1))
y <- poissmix.dat$freq
# Lower and upper bounds on parameters
lo <- c(0.001,0,0)
hi <- c(0.999, Inf, Inf)
p0 <- runif(3,c(0.2,1,1),c(0.8,5,8)) # randomly generated starting values
ans.spg <- spg(par=p0, fn=poissmix.loglik, y=y, lower=lo, upper=hi,
control=list(maximize=TRUE))
# how to compute hessian at the MLE
require(numDeriv)
hess <- hessian(x=ans.spg$par, poissmix.loglik, y=y)
se <- sqrt(-diag(solve(hess))) # approximate standard errors
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(BB)
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/BB/spg.Rd_%03d_medium.png", width=480, height=480)
> ### Name: spg
> ### Title: Large-Scale Optimization
> ### Aliases: spg
> ### Keywords: multivariate
>
> ### ** Examples
>
> sc2.f <- function(x){ sum((1:length(x)) * (exp(x) - x)) / 10}
>
> sc2.g <- function(x){ (1:length(x)) * (exp(x) - 1) / 10}
>
> p0 <- rnorm(50)
> ans.spg1 <- spg(par=p0, fn=sc2.f) # Default is method=3
iter: 0 f-value: 197.9292 pgrad: 30.8266
iter: 10 f-value: 127.6343 pgrad: 0.1627386
iter: 20 f-value: 127.5014 pgrad: 0.01786518
iter: 30 f-value: 127.5001 pgrad: 0.00403702
iter: 40 f-value: 127.5 pgrad: 0.0009215739
iter: 50 f-value: 127.5 pgrad: 0.0008783729
iter: 60 f-value: 127.5 pgrad: 3.922196e-05
> ans.spg2 <- spg(par=p0, fn=sc2.f, method=1)
iter: 0 f-value: 197.9292 pgrad: 30.8266
iter: 10 f-value: 127.5745 pgrad: 0.1291622
iter: 20 f-value: 127.5011 pgrad: 0.04761233
iter: 30 f-value: 127.5004 pgrad: 0.03254783
iter: 40 f-value: 127.5 pgrad: 0.000794671
iter: 50 f-value: 127.5 pgrad: 0.002499689
> ans.spg3 <- spg(par=p0, fn=sc2.f, method=2)
iter: 0 f-value: 197.9292 pgrad: 30.8266
iter: 10 f-value: 127.6612 pgrad: 0.1728854
iter: 20 f-value: 127.5045 pgrad: 0.1653243
iter: 30 f-value: 127.5002 pgrad: 0.007004957
iter: 40 f-value: 127.5 pgrad: 0.01086661
iter: 50 f-value: 127.5 pgrad: 0.000352145
iter: 60 f-value: 127.5 pgrad: 0.0003304024
iter: 70 f-value: 127.5 pgrad: 4.220624e-05
> ans.cg <- optim(par=p0, fn=sc2.f, method="CG") #Uses conjugate-gradient method in "optim"
> ans.lbfgs <- optim(par=p0, fn=sc2.f, method="L-BFGS-B") #Uses low-memory BFGS method in "optim"
>
> # Now we use exact gradient.
> # Computation is much faster compared to when using numerical gradient.
> ans.spg1 <- spg(par=p0, fn=sc2.f, gr=sc2.g)
iter: 0 f-value: 197.9292 pgrad: 30.8266
iter: 10 f-value: 127.6343 pgrad: 0.1627385
iter: 20 f-value: 127.5014 pgrad: 0.01786465
iter: 30 f-value: 127.5001 pgrad: 0.004031603
iter: 40 f-value: 127.5 pgrad: 0.0009264037
iter: 50 f-value: 127.5 pgrad: 0.0008680818
iter: 60 f-value: 127.5 pgrad: 3.642787e-05
>
> ############
> # Another example illustrating use of additional parameters to objective function
> valley.f <- function(x, cons) {
+ n <- length(x)
+ f <- rep(NA, n)
+ j <- 3 * (1:(n/3))
+ jm2 <- j - 2
+ jm1 <- j - 1
+ f[jm2] <- (cons[2] * x[jm2]^3 + cons[1] * x[jm2]) * exp(-(x[jm2]^2)/100) - 1
+ f[jm1] <- 10 * (sin(x[jm2]) - x[jm1])
+ f[j] <- 10 * (cos(x[jm2]) - x[j])
+ sum(f*f)
+ }
>
> k <- c(1.003344481605351, -3.344481605351171e-03)
> p0 <- rnorm(30) # number of parameters should be a multiple of 3 for this example
> ans.spg2 <- spg(par=p0, fn=valley.f, cons=k, method=2)
iter: 0 f-value: 2699.036 pgrad: 414.4792
iter: 10 f-value: 14.6236 pgrad: 2.238182
iter: 20 f-value: 11.47981 pgrad: 1.606474
iter: 30 f-value: 5.651816 pgrad: 3.957706
iter: 40 f-value: 4.567452 pgrad: 1.042303
iter: 50 f-value: 2.246198 pgrad: 1.423417
iter: 60 f-value: 0.9855292 pgrad: 0.4715191
iter: 70 f-value: 0.4940861 pgrad: 0.9706623
iter: 80 f-value: 0.138232 pgrad: 0.4169235
iter: 90 f-value: 0.0286886 pgrad: 0.07679862
iter: 100 f-value: 0.01005746 pgrad: 0.9391996
iter: 110 f-value: 1.072785e-05 pgrad: 0.001472615
> ans.cg <- optim(par=p0, fn=valley.f, cons=k, method="CG")
> ans.lbfgs <- optim(par=p0, fn=valley.f, cons=k, method="L-BFGS-B")
>
> ####################################################################
> # Here is a statistical example illustrating log-likelihood maximization.
>
> poissmix.loglik <- function(p,y) {
+ # Log-likelihood for a binary Poisson mixture
+ i <- 0:(length(y)-1)
+ loglik <- y*log(p[1]*exp(-p[2])*p[2]^i/exp(lgamma(i+1)) +
+ (1 - p[1])*exp(-p[3])*p[3]^i/exp(lgamma(i+1)))
+ return (sum(loglik) )
+ }
>
> # Data from Hasselblad (JASA 1969)
> poissmix.dat <- data.frame(death=0:9, freq=c(162,267,271,185,111,61,27,8,3,1))
> y <- poissmix.dat$freq
>
> # Lower and upper bounds on parameters
> lo <- c(0.001,0,0)
> hi <- c(0.999, Inf, Inf)
>
> p0 <- runif(3,c(0.2,1,1),c(0.8,5,8)) # randomly generated starting values
>
> ans.spg <- spg(par=p0, fn=poissmix.loglik, y=y, lower=lo, upper=hi,
+ control=list(maximize=TRUE))
iter: 0 f-value: -3122.275 pgrad: 7.68762
iter: 10 f-value: -1992.479 pgrad: 3.522377
iter: 20 f-value: -1993.036 pgrad: 3.378979
iter: 30 f-value: -1991.025 pgrad: 10.2916
iter: 40 f-value: -1990.284 pgrad: 3.520242
iter: 50 f-value: -1990.027 pgrad: 0.7321978
iter: 60 f-value: -1989.958 pgrad: 0.2921524
iter: 70 f-value: -1989.95 pgrad: 0.7366793
iter: 80 f-value: -1989.946 pgrad: 0.2869433
iter: 90 f-value: -1989.946 pgrad: 0.05401944
iter: 100 f-value: -1989.946 pgrad: 0.05123638
iter: 110 f-value: -1989.946 pgrad: 0.0068394
iter: 120 f-value: -1989.946 pgrad: 0.002016805
>
> # how to compute hessian at the MLE
> require(numDeriv)
Loading required package: numDeriv
> hess <- hessian(x=ans.spg$par, poissmix.loglik, y=y)
> se <- sqrt(-diag(solve(hess))) # approximate standard errors
>
>
>
>
>
>
>
> dev.off()
null device
1
>