```Quasi-Newton Methods
Optimization in Engineering Design
Georgia Institute of Technology
Systems Realization Laboratory
101
Background
Newton’s method for finding an extreme point is
x k+1 = x k - H -1 (x k )  y( x k )
Assumption: the evaluation of the Hessian is impractical or costly.
•
Central idea underlying quasi-Newton methods is to use an
approximation of the inverse Hessian.
Form of approximation differs among methods.
•
•
The quasi-Newton methods that build up an approximation of
the inverse Hessian are often regarded as the most
sophisticated for solving unconstrained problems.
Question: What is the simplest approximation?
Optimization in Engineering Design
Georgia Institute of Technology
Systems Realization Laboratory
102
Modified Newton Method
The Modified Newton method for finding an extreme point is
x k+1 = x k -  S k  y( x k )
Note that:
if S k = I, then we have the method of steepest descent
if S k = H -1(x k ) and  = 1, then we have the “pure” Newton method
-1
if y( x) = 0.5 x T Qx - b T x , then S k = H -1(x k ) = Q (quadratic case)
Classical Modified Newton’s Method:
x k+1 = x k -  H -1 (x 0 )  y( x k )
Note that the Hessian is only evaluated at the initial point
x0.
Question: What is a measure of effectiveness for the Classical
Modified Newton Method?
Optimization in Engineering Design
Georgia Institute of Technology
Systems Realization Laboratory
103
Note that:
if S k = I, then we have the method of steepest descent
if S k = H -1(x k ) and  = 1, then we have the “pure” Newton method
Modified Newton Method
if y( x) = 0.5 x T Qx - b T x , then S k = H -1(x k ) = Q (quadratic case)
Classical Modified Newton’s Method:
x k+1 = x k -  H -1 (x 0 )  y( x k )
Note that the Hessian is only evaluated at the initial point
x0.
•
La efectividad de este método depende de que tan rápido
cambia la matriz Hessiano
•
Es decir, depende de la magnitud de la derivada de tercer
orden de la función objetivo
Optimization in Engineering Design
Georgia Institute of Technology
Systems Realization Laboratory
104
Quasi-Newton Methods
•
La idea fundamental de los métodos Quasi-Newton es
intentar construir una aproximación de la inversa del
Hessiano, usando información obtenida durante el proceso
de descenso
•
Idealmente, las aproximaciones convergen a la inversa del
Hessiano en el punto solución, de manera que el método se
comporta de forma similar al método de Newton
•
Veremos como una aproximación del inverso del Hessiano
se puede construir a partir de información de gradiente en
varios puntos
Optimization in Engineering Design
Georgia Institute of Technology
Systems Realization Laboratory
105
Quasi-Newton Methods
In quasi-Newton methods, instead of the true Hessian, an initial matrix H0 is
chosen (usually H0 = I) which is subsequently updated by an update
formula:
Hk+1 = Hk + Hku
where Hku is the update matrix.
This updating can also be done with the inverse of the Hessian H-1 as
follows:
Let B = H-1; then the updating formula for the inverse is also of the form
Bk+1 = Bk + Bku
Big question: What is the update matrix?
Optimization in Engineering Design
Georgia Institute of Technology
Systems Realization Laboratory
106
Given two points xk and xk+1 , we define gk = y(xk) and gk+1 =  y(xk+1).
Further, let pk = xk+1 - xk , then
gk+1 - gk ≈ H(xk) pk
If the Hessian is constant, then
gk+1 - gk = H pk which can be rewritten as qk = H pk
If the Hessian is constant, then the following condition would hold as well
H-1k+1 qi = pi
0≤i≤k
This is called the quasi-Newton condition.
Optimization in Engineering Design
Georgia Institute of Technology
Systems Realization Laboratory
107
Rank One and Rank Two Updates
Let B = H-1, then the quasi-Newton condition becomes Bk+1 qi = pi 0 ≤ i ≤ k
Substitute the updating formula Bk+1 = Bk + Buk and the condition becomes
pi = Bk qi + Buk qi
(1)
(remember: pi = xi+1 - xi and qi = gi+1 - gi )
Note: There is no unique solution to funding the update matrix Buk
A general form is Buk = a uuT + b vvT
where a and b are scalars and u and v are vectors satisfying condition (1).
The quantities auuT and bvvT are symmetric matrices of (at most) rank one.
Quasi-Newton methods that take b = 0 are using rank one updates.
Quasi-Newton methods that take b ≠ 0 are using rank two updates.
Note that b ≠ 0 provides more flexibility.
Optimization in Engineering Design
Georgia Institute of Technology
Systems Realization Laboratory
108
Rank One and Rank Two Updates
Bk+1 = Bk + akukukT
(1)
Partiendo de
pk = Bk+1 qk = Bk qk + Buk qk
(2)
Multiplicando por qk
qkT pk – qkT Bk qk = ak(ukT qk)2
(3)
Usando (2) y (3) se puede escribir (1) como
B k 1  B k 
p k
 B k q k p k  B k q k 
Optimization in Engineering Design
qk
T
p k
T
 B kqk 
Georgia Institute of Technology
Systems Realization Laboratory
109
Rank One and Rank Two Updates
Para incorporar la aproximación del inverso del Hessiano en un
procedimiento de descenso se siguen los siguientes pasos:
1.
Calcular dirección de descenso, dk = -Bk gk
2.
Minimizar f(xk +  dk)  búsqueda lineal, para determinar xk+1
3.
Calcular gk+1, qk = gk+1 - gk y pk = xk+1 - xk
4.
Calcular la nueva aproximación del inverso del Hessiano según,
B k 1  B k 
Optimization in Engineering Design
p k
 B k q k p k  B k q k 
qk
T
p k
T
 B kqk 
Georgia Institute of Technology
Systems Realization Laboratory
110
Rank One and Rank Two Updates
Limitaciones
B k 1  B k 
p k
 B k q k p k  B k q k 
qk
T
p k
T
 B kqk 
•
Esta formula de actualización del inverso del Hessiano preserva su
definición positiva sólo si qkT(pk – Bk qk) > 0, lo cual no puede ser
•
Aunque qkT(pk – Bk qk) puede ser positivo, también puede tener un valor
muy pequeño, lo que conlleva a dificultades numéricas
Optimization in Engineering Design
Georgia Institute of Technology
Systems Realization Laboratory
111
Update Formulas
Rank one updates are simple, but have limitations.
Rank two updates are the most widely used schemes.
The following two update formulas have received wide acceptance:
• Davidon -Fletcher-Powell (DFP) formula
• Broyden-Fletcher-Goldfarb-Shanno (BFGS) formula.
Optimization in Engineering Design
Georgia Institute of Technology
Systems Realization Laboratory
112
Davidon-Fletcher-Powel Formula
•
•
•
Earliest (and one of the most clever) schemes for constructing the
inverse Hessian was originally proposed by Davidon (1959) and
later developed by Fletcher and Powell (1963).
It has the interesting property that, for a quadratic objective, it
simultaneously generates the directions of the conjugate gradient
method while constructing the inverse Hessian.
The method is also referred to as the variable metric method
(originally suggested by Davidon).
Q uasi-N ew ton condition w ith rank tw o update substituted is
p i = B k q i + a u u T q i + b vv T q i
S et u = p k , v = B k q k and let a u T q k = 1, b v T q k = -1 to determ ine a and b.
R esulting D avidon-F letcher-P ow ell update form ula is
p kp kT
B k+ 1 = B k +
p kTq k
Optimization in Engineering Design
–
B kq kq kTB k
q kTB kq k
Georgia Institute of Technology
Systems Realization Laboratory
113
Davidon-Fletcher-Powel Formula
Teorema
Si la función objetivo es cuadrática con Hessiano definido positivo H,
entonces para el método Davidon-Fletcher-Powel
piT H pj = 0,
0≤i<j≤k
Bk+1 H pi = pi
para 0 ≤ i ≤ k
Debido a que las pk’s son H-ortogonal y debido que se minimiza la función
sucesivas veces en estas direcciones, se dice que el método corresponde a un
Si la aproximación inicial del inverso del Hessiano es la matriz identidad, el
Optimization in Engineering Design
Georgia Institute of Technology
Systems Realization Laboratory
114
Broyden–Fletcher–Goldfarb–Shanno Formula
R em em b er th at q i = H k + 1 p i an d H -1 k + 1 q i = p i (o r, B k + 1 q i = p i )
0 = i= k
B o th eq u atio n s h av e ex actly th e sam e fo rm , ex cep t th at q i an d p i are in terch an g ed an d H
is rep laced b y B (o r v ice v ersa).
T h is lead s to th e o b serv atio n th at an y u p d ate fo rm u la fo r B can b e tran sfo rm ed in to a
co rresp o n d in g co m p lim en tary fo rm u la fo r H b y in terch an g in g th e ro les o f B an d H an d o f
q an d p . T h e rev erse is also tru e.
B ro y d en – F letch er– G o ld farb – S h an n o fo rm u la u p d ate o f H k is o b tain ed b y tak in g th e
co m p lim en tary fo rm u la o f th e D F P fo rm u la, th u s:
q kq kT
H k+1 = H k +
q kTp k
–
H kp kp kTH k
p kTH kp k
B y tak in g th e in v erse, th e B F G S u p d ate fo rm u la fo r B k + 1 (i.e., H -1 k + 1 ) is o b tain ed :
p kp kT
1 + q kTB kq k
B k+1 = B k + (
)
q kTp k
p kTq k
Optimization in Engineering Design
p kq kTB k + B kq kp kT
–
q kTp k
Georgia Institute of Technology
Systems Realization Laboratory
115
•
Broyden–Fletcher–Goldfarb–Shanno formula is more complicated
than DFP, but straightforward to apply
•
BFGS update formula can be used exactly like DFP formula.
•
Numerical experiments have shown that BFGS formula's
performance is superior over DFP formula. Hence, BFGS is often
preferred over DFP.
Both DFP and BFGS updates have symmetric rank two corrections that are
constructed from the vectors pk and Bkqk. Weighted combinations of these
formulae will therefore also have the same properties. This observation leads to
a whole collection of updates, know as the Broyden family, defined by:
Bf = (1 - f)BDFP + fBBFGS
where f is a parameter that may take any real value.
Optimization in Engineering Design
Georgia Institute of Technology
Systems Realization Laboratory
116
Quasi-Newton Algorithm
1. Input x0, B0, termination criteria.
2. For any k, set Sk = – Bkgk.
3. Compute a step size  (e.g., by line search on y(xk + Sk)) and
set xk+1 = xk + Sk.
4. Compute the update matrix Buk according to a given formula (say, DFP or
BFGS) using the values qk = gk+1 - gk , pk = xk+1 - xk , and Bk.
5. Set Bk+1 = Bk + Buk.
6. Continue with next k until termination criteria are satisfied.
Note: You do have to calculate the vector of first order
derivatives g for each iteration.
Optimization in Engineering Design
Georgia Institute of Technology
Systems Realization Laboratory
117
Some Closing Remarks
• Both DFP and BFGS methods have theoretical properties that
guarantee superlinear (fast) convergence rate and global
convergence under certain conditions.
• However, both methods could fail for general nonlinear problems.
Specifically,
• DFP is highly sensitive to inaccuracies in line searches.
• Both methods can get stuck on a saddle-point. In Newton's
method, a saddle-point can be detected during modifications of the
(true) Hessian. Therefore, search around the final point when
using quasi-Newton methods.
• Update of Hessian becomes "corrupted" by round-off and other
inaccuracies.
• All kind of "tricks" such as scaling and preconditioning exist to
boost the performance of the methods.
Optimization in Engineering Design
Georgia Institute of Technology
Systems Realization Laboratory
118
```