Movatterモバイル変換

Davidon–Fletcher–Powell formula

日本語

From Wikipedia, the free encyclopedia

TheDavidon–Fletcher–Powell formula (orDFP; named afterWilliam C. Davidon,Roger Fletcher, andMichael J. D. Powell) finds the solution to the secant equation that is closest to the current estimate and satisfies the curvature condition. It was the firstquasi-Newton method to generalize thesecant method to a multidimensional problem. This update maintains the symmetry and positive definiteness of theHessian matrix.

Given a function $f(x)$ , itsgradient ( $\nabla f$ ), andpositive-definite Hessian matrix $B {\displaystyle B}$ , theTaylor series is

f(x_{k}+s_{k})=f(x_{k})+\nabla f(x_{k})^{T}s_{k}+{\frac {1}{2}}s_{k}^{T}{B}s_{k}+\dots ,

and theTaylor series of the gradient itself (secant equation)

\nabla f(x_{k}+s_{k})=\nabla f(x_{k})+Bs_{k}+\dots

is used to update $B {\displaystyle B}$ .

The DFP formula finds a solution that is symmetric, positive-definite and closest to the current approximate value of $B_{k}$ :

B_{k+1}=(I-\gamma _{k}y_{k}s_{k}^{T})B_{k}(I-\gamma _{k}s_{k}y_{k}^{T})+\gamma _{k}y_{k}y_{k}^{T},

where

y_{k}=\nabla f(x_{k}+s_{k})-\nabla f(x_{k}),

\gamma _{k}={\frac {1}{y_{k}^{T}s_{k}}},

and $B_{k}$ is a symmetric andpositive-definite matrix.

The corresponding update to the inverse Hessian approximation $H_{k}=B_{k}^{-1}$ is given by

H_{k+1}=H_{k}-{\frac {H_{k}y_{k}y_{k}^{T}H_{k}}{y_{k}^{T}H_{k}y_{k}}}+{\frac {s_{k}s_{k}^{T}}{y_{k}^{T}s_{k}}}.

$B {\displaystyle B}$ is assumed to be positive-definite, and the vectors $s_{k}^{T}$ and $y {\displaystyle y}$ must satisfy the curvature condition

s_{k}^{T}y_{k}=s_{k}^{T}Bs_{k}>0.

The DFP formula is quite effective, but it was soon superseded by theBroyden–Fletcher–Goldfarb–Shanno formula, which is itsdual (interchanging the roles ofy ands).^[1]

Compact representation

[edit]

By unwinding the matrix recurrence for $B_{k}$ , the DFP formula can be expressedas acompact matrix representation. Specifically, defining

$S_{k}={\begin{bmatrix}s_{0}&s_{1}&\ldots &s_{k-1}\end{bmatrix}},$ $Y_{k}={\begin{bmatrix}y_{0}&y_{1}&\ldots &y_{k-1}\end{bmatrix}},$

and upper triangular and diagonal matrices

${\big (}R_{k}{\big )}_{ij}:={\big (}R_{k}^{\text{SY}}{\big )}_{ij}=s_{i-1}^{T}y_{j-1},\quad {\big (}R_{k}^{\text{YS}}{\big )}_{ij}=y_{i-1}^{T}s_{j-1},\quad (D_{k})_{ii}:={\big (}D_{k}^{\text{SY}}{\big )}_{ii}=s_{i-1}^{T}y_{i-1}\quad \quad {\text{ for }}1\leq i\leq j\leq k$

the DFP matrix has the equivalent formula

$B_{k}=B_{0}+J_{k}N_{k}^{-1}J_{k}^{T},$

$J_{k}={\begin{bmatrix}Y_{k}&Y_{k}-B_{0}S_{k}\end{bmatrix}}$

$N_{k}={\begin{bmatrix}0_{k\times k}&R_{k}^{\text{YS}}\\{\big (}R_{k}^{\text{YS}}{\big )}^{T}&R_{k}+R_{k}^{T}-(D_{k}+S_{k}^{T}B_{0}S_{k})\end{bmatrix}}$

The inverse compact representation can be found by applying theSherman-Morrison-Woodbury inverse to $B_{k}$ . The compact representation is particularly useful for limited-memory and constrained problems.^[2]

References

[edit]

^Avriel, Mordecai (1976).Nonlinear Programming: Analysis and Methods. Prentice-Hall. pp. 352–353.ISBN 0-13-623603-0.
^Brust, J. J. (2024). "Useful Compact Representations for Data-Fitting".arXiv:2403.12206 [math.OC].

Davidon, W. C. (1959)."Variable Metric Method for Minimization".AEC Research and Development Report ANL-5990.doi:10.2172/4252678.hdl:2027/mdp.39015078508226.
Fletcher, Roger (1987).Practical methods of optimization (2nd ed.). New York: John Wiley & Sons.ISBN 978-0-471-91547-8.
Kowalik, J.; Osborne, M. R. (1968).Methods for Unconstrained Optimization Problems. New York: Elsevier. pp. 45–48.ISBN 0-444-00041-0.
Nocedal, Jorge; Wright, Stephen J. (1999).Numerical Optimization. Springer-Verlag.ISBN 0-387-98793-2.
Walsh, G. R. (1975).Methods of Optimization. London: John Wiley & Sons. pp. 110–120.ISBN 0-471-91922-5.

Optimization:Algorithms,methods, andheuristics

Unconstrained nonlinear

Functions

Gradients

Convergence	Trust region Wolfe conditions
Quasi–Newton	Berndt–Hall–Hall–Hausman Broyden–Fletcher–Goldfarb–Shanno andL-BFGS Davidon–Fletcher–Powell Symmetric rank-one (SR1)
Other methods	Conjugate gradient Gauss–Newton Gradient Mirror Levenberg–Marquardt Powell's dog leg method Truncated Newton

Hessians

Newton's method

Graph of a strictly concave quadratic function with unique maximum. — Optimization computes maxima and minima.

Constrained nonlinear

General	Barrier methods Penalty methods
Differentiable	Augmented Lagrangian methods Sequential quadratic programming Successive linear programming

Convex optimization

Convex
minimization

Linear and
quadratic

Interior point	Affine scaling Ellipsoid algorithm of Khachiyan Projective algorithm of Karmarkar
Basis-exchange	Simplex algorithm of Dantzig Revised simplex algorithm Criss-cross algorithm Principal pivoting algorithm of Lemke Active-set method