TheDavidon–Fletcher–Powell formula (orDFP ; named afterWilliam C. Davidon ,Roger Fletcher , andMichael J. D. Powell ) finds the solution to the secant equation that is closest to the current estimate and satisfies the curvature condition. It was the firstquasi-Newton method to generalize thesecant method to a multidimensional problem. This update maintains the symmetry and positive definiteness of theHessian matrix .
Given a functionf ( x ) {\displaystyle f(x)} , itsgradient (∇ f {\displaystyle \nabla f} ), andpositive-definite Hessian matrix B {\displaystyle B} , theTaylor series is
f ( x k + s k ) = f ( x k ) + ∇ f ( x k ) T s k + 1 2 s k T B s k + … , {\displaystyle f(x_{k}+s_{k})=f(x_{k})+\nabla f(x_{k})^{T}s_{k}+{\frac {1}{2}}s_{k}^{T}{B}s_{k}+\dots ,} and theTaylor series of the gradient itself (secant equation)
∇ f ( x k + s k ) = ∇ f ( x k ) + B s k + … {\displaystyle \nabla f(x_{k}+s_{k})=\nabla f(x_{k})+Bs_{k}+\dots } is used to updateB {\displaystyle B} .
The DFP formula finds a solution that is symmetric, positive-definite and closest to the current approximate value ofB k {\displaystyle B_{k}} :
B k + 1 = ( I − γ k y k s k T ) B k ( I − γ k s k y k T ) + γ k y k y k T , {\displaystyle B_{k+1}=(I-\gamma _{k}y_{k}s_{k}^{T})B_{k}(I-\gamma _{k}s_{k}y_{k}^{T})+\gamma _{k}y_{k}y_{k}^{T},} where
y k = ∇ f ( x k + s k ) − ∇ f ( x k ) , {\displaystyle y_{k}=\nabla f(x_{k}+s_{k})-\nabla f(x_{k}),} γ k = 1 y k T s k , {\displaystyle \gamma _{k}={\frac {1}{y_{k}^{T}s_{k}}},} andB k {\displaystyle B_{k}} is a symmetric andpositive-definite matrix .
The corresponding update to the inverse Hessian approximationH k = B k − 1 {\displaystyle H_{k}=B_{k}^{-1}} is given by
H k + 1 = H k − H k y k y k T H k y k T H k y k + s k s k T y k T s k . {\displaystyle H_{k+1}=H_{k}-{\frac {H_{k}y_{k}y_{k}^{T}H_{k}}{y_{k}^{T}H_{k}y_{k}}}+{\frac {s_{k}s_{k}^{T}}{y_{k}^{T}s_{k}}}.} B {\displaystyle B} is assumed to be positive-definite, and the vectorss k T {\displaystyle s_{k}^{T}} andy {\displaystyle y} must satisfy the curvature condition
s k T y k = s k T B s k > 0. {\displaystyle s_{k}^{T}y_{k}=s_{k}^{T}Bs_{k}>0.} The DFP formula is quite effective, but it was soon superseded by theBroyden–Fletcher–Goldfarb–Shanno formula , which is itsdual (interchanging the roles ofy ands ).[ 1]
Compact representation [ edit ] By unwinding the matrix recurrence forB k {\displaystyle B_{k}} , the DFP formula can be expressedas acompact matrix representation . Specifically, defining
S k = [ s 0 s 1 … s k − 1 ] , {\displaystyle S_{k}={\begin{bmatrix}s_{0}&s_{1}&\ldots &s_{k-1}\end{bmatrix}},} Y k = [ y 0 y 1 … y k − 1 ] , {\displaystyle Y_{k}={\begin{bmatrix}y_{0}&y_{1}&\ldots &y_{k-1}\end{bmatrix}},}
and upper triangular and diagonal matrices
( R k ) i j := ( R k SY ) i j = s i − 1 T y j − 1 , ( R k YS ) i j = y i − 1 T s j − 1 , ( D k ) i i := ( D k SY ) i i = s i − 1 T y i − 1 for 1 ≤ i ≤ j ≤ k {\displaystyle {\big (}R_{k}{\big )}_{ij}:={\big (}R_{k}^{\text{SY}}{\big )}_{ij}=s_{i-1}^{T}y_{j-1},\quad {\big (}R_{k}^{\text{YS}}{\big )}_{ij}=y_{i-1}^{T}s_{j-1},\quad (D_{k})_{ii}:={\big (}D_{k}^{\text{SY}}{\big )}_{ii}=s_{i-1}^{T}y_{i-1}\quad \quad {\text{ for }}1\leq i\leq j\leq k}
the DFP matrix has the equivalent formula
B k = B 0 + J k N k − 1 J k T , {\displaystyle B_{k}=B_{0}+J_{k}N_{k}^{-1}J_{k}^{T},}
J k = [ Y k Y k − B 0 S k ] {\displaystyle J_{k}={\begin{bmatrix}Y_{k}&Y_{k}-B_{0}S_{k}\end{bmatrix}}}
N k = [ 0 k × k R k YS ( R k YS ) T R k + R k T − ( D k + S k T B 0 S k ) ] {\displaystyle N_{k}={\begin{bmatrix}0_{k\times k}&R_{k}^{\text{YS}}\\{\big (}R_{k}^{\text{YS}}{\big )}^{T}&R_{k}+R_{k}^{T}-(D_{k}+S_{k}^{T}B_{0}S_{k})\end{bmatrix}}}
The inverse compact representation can be found by applying theSherman-Morrison-Woodbury inverse toB k {\displaystyle B_{k}} . The compact representation is particularly useful for limited-memory and constrained problems.[ 2]
^ Avriel, Mordecai (1976).Nonlinear Programming: Analysis and Methods . Prentice-Hall. pp. 352– 353.ISBN 0-13-623603-0 . ^ Brust, J. J. (2024). "Useful Compact Representations for Data-Fitting".arXiv :2403.12206 [math.OC ].
Optimization computes maxima and minima.