Maximizing a function F(U), where U is a semi-orthogonal matrix and the function is
invariant under an orthogonal transformation of U. An explicit expression of the gradient is
not required and the hessian is not used. It includes a global search option using simulated annealing.
a required R function that evaluates value and possibly gradient of the function to be maximized.
It returns a list of components in which the component value is required whereas gradient is optional.
When gradient is not provided, an approximation is used by default. The parameter of objfun is W
that is a list of components described next.
W
a list object of arguments to be passed to objfun. It contains all arguments required to compute the objective function and eventually
the gradient. It has a required component that is the dimension of the matrix U as dim=c(d, p) where d is the number of
columns and p is the number of rows d<p. An initial starting matrix may be provided as a pxp orthogonal matrix Qt.
The minimal expression of W is list(dim=c(d,p)).
sim_anneal
If TRUE the program searches for global maximum by simulating annealing using stochastic gradients. If FALSE a local maximum is
likely to be reached.
temp_init
a positive scalar that is the initial temperature for simulated annealing if sim_anneal is TRUE.
The minimum temperature is set to 0.1.
cooling_rate
a positive scalar greater than 1 that controls the cooling process. A new cooling temperature is obtained as the previous
divided by the cooling rate.
max_iter_sa
a positive integer specifying the maximum number of iterations to be performed at a fixed temperature before cooling.
eps_conv
a small positive scalar. The program terminates when the norm of the gradient gets smaller or equal to eps_conv.
max_iter
a positive integer specifying the maximum number of iterations to be performed before the program is terminated.
eps_grad
is a small positive scalar. If gradient is not explicitly provided through objfun, eps_grad is used to
estimate gradient based on a finite difference.
eps_f
small positive scalar giving the tolerance at which the difference between the current objective function value
and the preceding is considered small enough to terminate the program.
verbose
if TRUE, steps are printed. Otherwise, nothing is printed.
Details
The algorithm was adapted from Liu, Srivastava and Gallivan (2004)
who discussed the geometry of Grassmann manifolds.
See also Edelman, Arias and Smith (1998) for more expositions.
This is a non-linear optimization program. We describe a basic
gradient algorithm for Grassmann manifolds.
Let G_{p,d} be the set of all d-dimensional subspaces
of mathrm{R}^n. It is a compact, connected manifold of
dimension d(p-d). An element of this manifold is a subspace.
It can be represented by a basis or by a projection matrix.
Here, the computations are carried in terms of the bases.
Let U such that \texttt{Span}(U)in G_{p,d}.
We consider an objective function F(U) to be optimized.
Let D(U) be the gradient of the objective function F
at the point U. The algorithm starts with an initial value
U_i of U. For step size δ in mathrm{R}^1,
a single step of the gradient algorithm is
Q_{t+1} = exp(-δ A) Q_t
where Q_t=[U_t,V_t] and V_t is the orthogonal completion of
U_t so that Q_t is orthogonal. The matrix A is computed
using the directional derivatives of F. The new value of the
objective function is F(U_t).
The matrix A is skewed-symmetric and exp(-δ old{A})
is orthogonal. The algorithm works by rotating the starting orthonormal
basis Q_t to a new basis by left multiplication by an orthogonal matrix.
The iterations continues until a stopping criterion is met.
Ideally, convergence is met when the norm of the gradient is
sufficiently small. But stopping can be set at a fixed number of iterations.
An explicit expression of the gradient may not be provided; finite difference
approximations are used instead. However, deriving the gradient expression
may pay off in terms of the efficiency and reliability of the algorithm.
But a differentiable function F that maps G_{p,d}
to mathrm{R}^1 is necessary.
The choice of the initial starting value U_i of U is important.
We recommend not to use random start for the optimization to avoid a local maximum.
Liu et al. (2004) suggested a simulated annealing method to attain a global optimum.
Value
A list containing the following components
Qt
optimal orthogonal matrix such that Qt[,1:d] maximizes the objective function.
norm_grads
a vector of successive norms of the directional derivative throughout all iterations.
The last scalar is the norm of the gradient at the optimal Qt.
fvalues
a vector of successive values of the objective function throughout all iterations.
The last scalar is the value of the objective function at the optimal Qt.
converged
if TRUE, the final iterate was considered optimal by the specified termination criteria.
Warning
This program may search for a global maximizer using a simulated annealing stochastic gradient.
The choice of the initial temperature, cooling rate and also of the maximum allowable number of iterations within the simulated
annealing process affect the success of reaching that global maximum.
Note
This program uses the objective function objfun provided by the user.
An expression of the objective function needs to follow the format illustrated in the example.
Author(s)
Kofi Placid Adragni <kofi@umbc.edu> and Seongho Wu
References
Liu, X.; Srivastava, A,; Gallivan, K. (2004)
Optimal linear representations of images for object recognition. IEEE
Transactions on Pattern Analysis and Machine Intelligence. Vol 26, No. 5, pp 662-666
Edelman, A.; Arias, T. A.; Smith, S. T. (1998)
The Geometry of Algorithms with Orthogonality Constraints.
SIAM J. Matrix Anal. Appl. Vol. 20, No. 2, pp 303-353
See Also
nlm, nlminb, optim,
optimize, constrOptim for other optimization functions.
Examples
objfun <- function(W){value <- f(W); gradient <- Grad(W);
return(list(value=value, gradient=gradient))}
f <- function(W){d <- W$dim[1]; Y<-matrix(W$Qt[,1:d], ncol=d);
return(0.5*sum(diag(t(Y)%*%W$A%*%Y)))}
Grad <- function(W){
Qt <- W$Qt; d <- W$dim[1]; p <- nrow(Qt); grad <- matrix (0, p, p);
Y <- matrix(Qt[,1:d], ncol=d); Y0 <- matrix(Qt[,(d+1):p], ncol=(p-d));
return(t(Y) %*% W$A %*% Y0)}
p=5; d=2; set.seed(234);
a <- matrix(rnorm(p**2), ncol=p); A <- t(a)%*%a;
# Exact Solution
W <- list(Qt=eigen(A)$vectors[,1:p], dim=c(d,p), A=A);
ans <- GrassmannOptim(objfun, W, eps_conv=1e-5, verbose=TRUE);
ans$converged
# Random starting matrix
m<-matrix(rnorm(p**2), ncol=p); m<-t(m)%*%m;
W <- list(Qt=eigen(m)$vectors, dim=c(d,p), A=A);
ans <- GrassmannOptim(objfun, W, eps_conv=1e-5, verbose=TRUE);
plot(ans$fvalues)
# Simulated Annealing
W <- list(dim=c(d,p), A=A);
ans <- GrassmannOptim(objfun, W, sim_anneal=TRUE, max_iter_sa=35,
verbose=TRUE);
########
set.seed(13); p=8; nobs=200; d=3; sigma=1.5; sigma0=2;
rmvnorm <- function (n, mean = rep(0, nrow(sigma)), sigma = diag(length(mean)))
{ # This function generates random numbers from the multivariate normal -
# see library "mvtnorm"
ev <- eigen(sigma, symmetric = TRUE)
retval <- ev$vectors %*% diag(sqrt(ev$values), length(ev$values)) %*%
t(ev$vectors)
retval <- matrix(rnorm(n * ncol(sigma)), nrow = n) %*% retval;
retval <- sweep(retval, 2, mean, "+");
colnames(retval) <- names(mean);
retval
}
objfun <- function(W){return(list(value=f(W), gradient=Gradient(W)))}
f <- function(W){
Qt <- W$Qt; d <- W$dim[1]; p <- ncol(Qt); Sigmas <- W$sigmas;
U <- matrix(Qt[,1:d], ncol=d); V <- matrix(Qt[,(d+1):p], ncol=(p-d));
return(-log(det(t(V)%*%Sigmas$S%*%V))-log(det(t(U)%*%Sigmas$S_res%*%U)))}
Gradient <- function(W)
{Qt <- W$Qt; d <- W$dim[1]; p <- ncol(Qt); Sigmas <- W$sigmas;
U <- matrix(Qt[,1:d], ncol=d); V <- matrix(Qt[,(d+1):p], ncol=(p-d));
terme1 <- solve(t(U)%*%Sigmas$S_res%*%U)%*% t(U)%*%Sigmas$S_res%*%V;
terme2 <- t(U)%*%Sigmas$S%*%V%*%solve(t(V)%*%Sigmas$S%*%V);
return(2*(terme1 - terme2))}
y<-array(runif(n=nobs, min=-2, max=2), c(nobs, 1));
fy<-scale(cbind(y, y^2, y^3),TRUE,FALSE);
#Structured error PFC model;
Gamma<-diag(p)[,c(1:3)]; Gamma0<-diag(p)[,-c(1:3)];
Omega <-sigma^2*matrix(0.5, ncol=3, nrow=3); diag(Omega)<-sigma^2;
Delta<- Gamma%*%Omega%*%t(Gamma) + sigma0^2*Gamma0%*%t(Gamma0);
Err <- t(rmvnorm(n=nobs, mean = rep(0, p), sigma = Delta))
beta <- diag(3*c(1, 0.4, 0.4));
X <- t(Gamma%*%beta%*%t(fy) + Err);
Xc <- scale(X, TRUE, FALSE);
P_F <- fy%*%solve(t(fy)%*%fy)%*%t(fy);
S <- t(Xc)%*%Xc/nobs; S_fit <- t(Xc)%*%P_F%*%Xc/nobs; S_res <- S-S_fit;
sigmas <- list(S=S, S_fit=S_fit, S_res=S_res, p=p, nobs=nobs);
# Random starting matrix;
Qt <- svd(matrix(rnorm(p^2), ncol=p))$u;
W <- list(Qt=Qt, dim=c(d, p), sigmas=sigmas)
ans <- GrassmannOptim(objfun, W, eps_conv=1e-4);
ans$converged;
ans$fvalues;
ans$Qt[,1:3];
# Good starting matrix;
Qt <- svd(S_fit)$u;
W <- list(Qt=Qt, dim=c(d, p), sigmas=sigmas)
ans <- GrassmannOptim(objfun, W, eps_conv=1e-4, verbose=TRUE);
ans$converged;
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(GrassmannOptim)
Loading required package: Matrix
> png(filename="/home/ddbj/snapshot/RGM3/R_CC/result/GrassmannOptim/GrassmannOptim.Rd_%03d_medium.png", width=480, height=480)
> ### Name: GrassmannOptim
> ### Title: Grassmann Manifold Optimization
> ### Aliases: GrassmannOptim
> ### Keywords: optimize programming package
>
> ### ** Examples
>
>
> objfun <- function(W){value <- f(W); gradient <- Grad(W);
+ return(list(value=value, gradient=gradient))}
>
> f <- function(W){d <- W$dim[1]; Y<-matrix(W$Qt[,1:d], ncol=d);
+ return(0.5*sum(diag(t(Y)%*%W$A%*%Y)))}
>
> Grad <- function(W){
+ Qt <- W$Qt; d <- W$dim[1]; p <- nrow(Qt); grad <- matrix (0, p, p);
+ Y <- matrix(Qt[,1:d], ncol=d); Y0 <- matrix(Qt[,(d+1):p], ncol=(p-d));
+ return(t(Y) %*% W$A %*% Y0)}
>
> p=5; d=2; set.seed(234);
> a <- matrix(rnorm(p**2), ncol=p); A <- t(a)%*%a;
>
> # Exact Solution
> W <- list(Qt=eigen(A)$vectors[,1:p], dim=c(d,p), A=A);
> ans <- GrassmannOptim(objfun, W, eps_conv=1e-5, verbose=TRUE);
Initialization...
iter loglik gradient
1 1.1975e+01 6.7463e-29
> ans$converged
[1] TRUE
>
> # Random starting matrix
> m<-matrix(rnorm(p**2), ncol=p); m<-t(m)%*%m;
> W <- list(Qt=eigen(m)$vectors, dim=c(d,p), A=A);
> ans <- GrassmannOptim(objfun, W, eps_conv=1e-5, verbose=TRUE);
Initialization...
iter loglik gradient
1 4.6296e+00 4.4970e+01
2 9.1212e+00 3.1947e+01
3 1.1223e+01 1.0279e+01
4 1.1544e+01 5.9649e+00
5 1.1827e+01 2.0490e+00
6 1.1905e+01 9.7966e-01
7 1.1973e+01 3.3832e-02
8 1.1975e+01 1.2132e-03
9 1.1975e+01 4.4254e-04
10 1.1975e+01 2.5610e-04
11 1.1975e+01 3.6325e-06
> plot(ans$fvalues)
>
> # Simulated Annealing
> W <- list(dim=c(d,p), A=A);
> ans <- GrassmannOptim(objfun, W, sim_anneal=TRUE, max_iter_sa=35,
+ verbose=TRUE);
Initialization...
Simulated Annealing... This may take a while.
Initial temperature= 20
Cooling...
Current temperature:
10
5
2.5
1.25
0.625
0.3125
0.15625
0.078125
iter loglik gradient
1 1.1975e+01 3.4761e-07
>
> ########
>
> set.seed(13); p=8; nobs=200; d=3; sigma=1.5; sigma0=2;
>
> rmvnorm <- function (n, mean = rep(0, nrow(sigma)), sigma = diag(length(mean)))
+ { # This function generates random numbers from the multivariate normal -
+ # see library "mvtnorm"
+ ev <- eigen(sigma, symmetric = TRUE)
+ retval <- ev$vectors %*% diag(sqrt(ev$values), length(ev$values)) %*%
+ t(ev$vectors)
+ retval <- matrix(rnorm(n * ncol(sigma)), nrow = n) %*% retval;
+ retval <- sweep(retval, 2, mean, "+");
+ colnames(retval) <- names(mean);
+ retval
+ }
>
> objfun <- function(W){return(list(value=f(W), gradient=Gradient(W)))}
>
> f <- function(W){
+ Qt <- W$Qt; d <- W$dim[1]; p <- ncol(Qt); Sigmas <- W$sigmas;
+ U <- matrix(Qt[,1:d], ncol=d); V <- matrix(Qt[,(d+1):p], ncol=(p-d));
+ return(-log(det(t(V)%*%Sigmas$S%*%V))-log(det(t(U)%*%Sigmas$S_res%*%U)))}
>
> Gradient <- function(W)
+ {Qt <- W$Qt; d <- W$dim[1]; p <- ncol(Qt); Sigmas <- W$sigmas;
+ U <- matrix(Qt[,1:d], ncol=d); V <- matrix(Qt[,(d+1):p], ncol=(p-d));
+ terme1 <- solve(t(U)%*%Sigmas$S_res%*%U)%*% t(U)%*%Sigmas$S_res%*%V;
+ terme2 <- t(U)%*%Sigmas$S%*%V%*%solve(t(V)%*%Sigmas$S%*%V);
+ return(2*(terme1 - terme2))}
>
> y<-array(runif(n=nobs, min=-2, max=2), c(nobs, 1));
> fy<-scale(cbind(y, y^2, y^3),TRUE,FALSE);
>
> #Structured error PFC model;
> Gamma<-diag(p)[,c(1:3)]; Gamma0<-diag(p)[,-c(1:3)];
> Omega <-sigma^2*matrix(0.5, ncol=3, nrow=3); diag(Omega)<-sigma^2;
> Delta<- Gamma%*%Omega%*%t(Gamma) + sigma0^2*Gamma0%*%t(Gamma0);
>
> Err <- t(rmvnorm(n=nobs, mean = rep(0, p), sigma = Delta))
> beta <- diag(3*c(1, 0.4, 0.4));
> X <- t(Gamma%*%beta%*%t(fy) + Err);
>
> Xc <- scale(X, TRUE, FALSE);
> P_F <- fy%*%solve(t(fy)%*%fy)%*%t(fy);
> S <- t(Xc)%*%Xc/nobs; S_fit <- t(Xc)%*%P_F%*%Xc/nobs; S_res <- S-S_fit;
> sigmas <- list(S=S, S_fit=S_fit, S_res=S_res, p=p, nobs=nobs);
>
> # Random starting matrix;
> Qt <- svd(matrix(rnorm(p^2), ncol=p))$u;
> W <- list(Qt=Qt, dim=c(d, p), sigmas=sigmas)
>
> ans <- GrassmannOptim(objfun, W, eps_conv=1e-4);
> ans$converged;
[1] TRUE
> ans$fvalues;
[1] -11.72027 -10.17907 -9.55686 -9.27004 -9.16935 -9.12446 -9.08039
[8] -9.04182 -9.03729 -9.03619 -9.03284 -9.03117 -9.03044 -9.02934
[15] -9.02858 -9.02780 -9.02744 -9.02674 -9.02629 -9.02551 -9.02257
[22] -9.02214 -9.02149 -9.02107 -9.02016 -9.01899 -9.01881 -9.01786
[29] -9.01728 -9.01708 -9.01699 -9.01561 -9.01517 -9.01481 -9.01303
[36] -9.01260 -9.01217 -9.01182 -9.01143 -9.01107 -9.01070 -9.01029
[43] -9.01023 -9.00984 -9.00970 -9.00894 -9.00821 -9.00788 -9.00740
[50] -9.00708 -9.00644 -9.00487 -9.00436 -9.00360 -9.00304 -9.00225
[57] -9.00159 -9.00087 -9.00017 -8.99949 -8.99850 -8.99778 -8.99656
[64] -8.99556 -8.99440 -8.99327 -8.99232 -8.99148 -8.99032 -8.98977
[71] -8.98840 -8.98748 -8.98547 -8.98397 -8.97989 -8.97628 -8.97373
[78] -8.97078 -8.96724 -8.96372 -8.93364 -8.91611 -8.90180 -8.89029
[85] -8.72498 -8.68513 -8.62253 -8.56864 -8.56576 -8.47089 -8.45571
[92] -8.45272 -8.45225 -8.45181 -8.45164 -8.45163 -8.45155 -8.45151
[99] -8.45149
> ans$Qt[,1:3];
[,1] [,2] [,3]
[1,] 0.4587 -0.8844 0.0035
[2,] 0.8338 0.4364 -0.3152
[3,] -0.2765 -0.1492 -0.9472
[4,] -0.0110 -0.0339 -0.0126
[5,] 0.0779 0.0055 -0.0462
[6,] -0.0418 0.0423 -0.0304
[7,] 0.0419 0.0467 0.0117
[8,] -0.0902 0.0054 -0.0115
>
> # Good starting matrix;
> Qt <- svd(S_fit)$u;
> W <- list(Qt=Qt, dim=c(d, p), sigmas=sigmas)
> ans <- GrassmannOptim(objfun, W, eps_conv=1e-4, verbose=TRUE);
Initialization...
iter loglik gradient
1 -8.5060e+00 4.1594e-01
2 -8.4788e+00 1.7224e-01
3 -8.4646e+00 8.3864e-02
4 -8.4588e+00 4.7164e-02
5 -8.4558e+00 3.0988e-02
6 -8.4529e+00 1.0618e-02
7 -8.4517e+00 4.1179e-03
8 -8.4516e+00 6.9205e-04
9 -8.4515e+00 2.7165e-04
10 -8.4515e+00 1.2218e-04
11 -8.4515e+00 6.5718e-05
> ans$converged;
[1] TRUE
>
>
>
>
>
> dev.off()
null device
1
>