Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Davidon–Fletcher–Powell formula

From Wikipedia, the free encyclopedia

TheDavidon–Fletcher–Powell formula (orDFP; named afterWilliam C. Davidon,Roger Fletcher, andMichael J. D. Powell) finds the solution to the secant equation that is closest to the current estimate and satisfies the curvature condition. It was the firstquasi-Newton method to generalize thesecant method to a multidimensional problem. This update maintains the symmetry and positive definiteness of theHessian matrix.

Given a functionf(x){\displaystyle f(x)}, itsgradient (f{\displaystyle \nabla f}), andpositive-definiteHessian matrixB{\displaystyle B}, theTaylor series is

f(xk+sk)=f(xk)+f(xk)Tsk+12skTBsk+,{\displaystyle f(x_{k}+s_{k})=f(x_{k})+\nabla f(x_{k})^{T}s_{k}+{\frac {1}{2}}s_{k}^{T}{B}s_{k}+\dots ,}

and theTaylor series of the gradient itself (secant equation)

f(xk+sk)=f(xk)+Bsk+{\displaystyle \nabla f(x_{k}+s_{k})=\nabla f(x_{k})+Bs_{k}+\dots }

is used to updateB{\displaystyle B}.

The DFP formula finds a solution that is symmetric, positive-definite and closest to the current approximate value ofBk{\displaystyle B_{k}}:

Bk+1=(IγkykskT)Bk(IγkskykT)+γkykykT,{\displaystyle B_{k+1}=(I-\gamma _{k}y_{k}s_{k}^{T})B_{k}(I-\gamma _{k}s_{k}y_{k}^{T})+\gamma _{k}y_{k}y_{k}^{T},}

where

yk=f(xk+sk)f(xk),{\displaystyle y_{k}=\nabla f(x_{k}+s_{k})-\nabla f(x_{k}),}
γk=1ykTsk,{\displaystyle \gamma _{k}={\frac {1}{y_{k}^{T}s_{k}}},}

andBk{\displaystyle B_{k}} is a symmetric andpositive-definite matrix.

The corresponding update to the inverse Hessian approximationHk=Bk1{\displaystyle H_{k}=B_{k}^{-1}} is given by

Hk+1=HkHkykykTHkykTHkyk+skskTykTsk.{\displaystyle H_{k+1}=H_{k}-{\frac {H_{k}y_{k}y_{k}^{T}H_{k}}{y_{k}^{T}H_{k}y_{k}}}+{\frac {s_{k}s_{k}^{T}}{y_{k}^{T}s_{k}}}.}

B{\displaystyle B} is assumed to be positive-definite, and the vectorsskT{\displaystyle s_{k}^{T}} andy{\displaystyle y} must satisfy the curvature condition

skTyk=skTBsk>0.{\displaystyle s_{k}^{T}y_{k}=s_{k}^{T}Bs_{k}>0.}

The DFP formula is quite effective, but it was soon superseded by theBroyden–Fletcher–Goldfarb–Shanno formula, which is itsdual (interchanging the roles ofy ands).[1]

Compact representation

[edit]

By unwinding the matrix recurrence forBk{\displaystyle B_{k}}, the DFP formula can be expressedas acompact matrix representation. Specifically, defining

Sk=[s0s1sk1],{\displaystyle S_{k}={\begin{bmatrix}s_{0}&s_{1}&\ldots &s_{k-1}\end{bmatrix}},}Yk=[y0y1yk1],{\displaystyle Y_{k}={\begin{bmatrix}y_{0}&y_{1}&\ldots &y_{k-1}\end{bmatrix}},}

and upper triangular and diagonal matrices

(Rk)ij:=(RkSY)ij=si1Tyj1,(RkYS)ij=yi1Tsj1,(Dk)ii:=(DkSY)ii=si1Tyi1 for 1ijk{\displaystyle {\big (}R_{k}{\big )}_{ij}:={\big (}R_{k}^{\text{SY}}{\big )}_{ij}=s_{i-1}^{T}y_{j-1},\quad {\big (}R_{k}^{\text{YS}}{\big )}_{ij}=y_{i-1}^{T}s_{j-1},\quad (D_{k})_{ii}:={\big (}D_{k}^{\text{SY}}{\big )}_{ii}=s_{i-1}^{T}y_{i-1}\quad \quad {\text{ for }}1\leq i\leq j\leq k}

the DFP matrix has the equivalent formula

Bk=B0+JkNk1JkT,{\displaystyle B_{k}=B_{0}+J_{k}N_{k}^{-1}J_{k}^{T},}

Jk=[YkYkB0Sk]{\displaystyle J_{k}={\begin{bmatrix}Y_{k}&Y_{k}-B_{0}S_{k}\end{bmatrix}}}

Nk=[0k×kRkYS(RkYS)TRk+RkT(Dk+SkTB0Sk)]{\displaystyle N_{k}={\begin{bmatrix}0_{k\times k}&R_{k}^{\text{YS}}\\{\big (}R_{k}^{\text{YS}}{\big )}^{T}&R_{k}+R_{k}^{T}-(D_{k}+S_{k}^{T}B_{0}S_{k})\end{bmatrix}}}

The inverse compact representation can be found by applying theSherman-Morrison-Woodbury inverse toBk{\displaystyle B_{k}}. The compact representation is particularly useful for limited-memory and constrained problems.[2]

See also

[edit]

References

[edit]
  1. ^Avriel, Mordecai (1976).Nonlinear Programming: Analysis and Methods. Prentice-Hall. pp. 352–353.ISBN 0-13-623603-0.
  2. ^Brust, J. J. (2024). "Useful Compact Representations for Data-Fitting".arXiv:2403.12206 [math.OC].

Further reading

[edit]
Functions
Gradients
Convergence
Quasi–Newton
Other methods
Hessians
Graph of a strictly concave quadratic function with unique maximum.
Optimization computes maxima and minima.
General
Differentiable
Convex
minimization
Linear and
quadratic
Interior point
Basis-exchange
Paradigms
Graph
algorithms
Minimum
spanning tree
Shortest path
Network flows
Retrieved from "https://en.wikipedia.org/w/index.php?title=Davidon–Fletcher–Powell_formula&oldid=1298035495"
Category:

[8]ページ先頭

©2009-2025 Movatter.jp