Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Quasi-Newton method

From Wikipedia, the free encyclopedia
Optimization algorithm
This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "Quasi-Newton method" – news ·newspapers ·books ·scholar ·JSTOR
(January 2025) (Learn how and when to remove this message)

Innumerical analysis, aquasi-Newton method is aniterative numerical method used either tofind zeroes or tofind local maxima and minima of functions via an iterativerecurrence formula much like the one forNewton's method, except using approximations of thederivatives of the functions in place of exact derivatives. Newton's method requires theJacobian matrix of allpartial derivatives of a multivariate function when used to search for zeros or theHessian matrix when usedfor finding extrema. Quasi-Newton methods, on the other hand, can be used when the Jacobian matrices or Hessian matrices are unavailable or are impractical to compute at every iteration.

Someiterative methods that reduce to Newton's method, such assequential quadratic programming, may also be considered quasi-Newton methods.

Search for zeros: root finding

[edit]

Newton's method to find zeroes of a functiong{\displaystyle g} of multiple variables is given byxn+1=xn[Jg(xn)]1g(xn){\displaystyle x_{n+1}=x_{n}-[J_{g}(x_{n})]^{-1}g(x_{n})}, where[Jg(xn)]1{\displaystyle [J_{g}(x_{n})]^{-1}} is theleft inverse of theJacobian matrixJg(xn){\displaystyle J_{g}(x_{n})} ofg{\displaystyle g} evaluated forxn{\displaystyle x_{n}}.

Strictly speaking, any method that replaces the exact JacobianJg(xn){\displaystyle J_{g}(x_{n})} with an approximation is a quasi-Newton method.[1] For instance, the chord method (whereJg(xn){\displaystyle J_{g}(x_{n})} is replaced byJg(x0){\displaystyle J_{g}(x_{0})} for all iterations) is a simple example. The methods given below foroptimization refer to an important subclass of quasi-Newton methods,secant methods.[2]

Using methods developed to find extrema in order to find zeroes is not always a good idea, as the majority of the methods used to find extrema require that the matrix that is used is symmetrical. While this holds in the context of the search for extrema, it rarely holds when searching for zeroes.Broyden's "good" and "bad" methods are two methods commonly used to find extrema that can also be applied to find zeroes. Other methods that can be used are thecolumn-updating method, theinverse column-updating method, the quasi-Newton least squares method and the quasi-Newton inverse least squares method.

More recently quasi-Newton methods have been applied to find the solution of multiple coupled systems of equations (e.g. fluid–structure interaction problems or interaction problems in physics). They allow the solution to be found by solving each constituent system separately (which is simpler than the global system) in a cyclic, iterative fashion until the solution of the global system is found.[2][3]

Search for extrema: optimization

[edit]

The search for a minimum or maximum of a scalar-valued function is closely related to the search for the zeroes of thegradient of that function. Therefore, quasi-Newton methods can be readily applied to find extrema of a function. In other words, ifg{\displaystyle g} is the gradient off{\displaystyle f}, then searching for the zeroes of the vector-valued functiong{\displaystyle g} corresponds to the search for the extrema of the scalar-valued functionf{\displaystyle f}; the Jacobian ofg{\displaystyle g} now becomes the Hessian off{\displaystyle f}. The main difference is thatthe Hessian matrix is a symmetric matrix, unlike the Jacobian whensearching for zeroes. Most quasi-Newton methods used in optimization exploit this symmetry.

Inoptimization,quasi-Newton methods (a special case ofvariable-metric methods) are algorithms for finding localmaxima and minima offunctions. Quasi-Newton methods for optimization are based onNewton's method to find thestationary points of a function, points where the gradient is 0. Newton's method assumes that the function can be locally approximated as aquadratic in the region around the optimum, and uses the first and second derivatives to find the stationary point. In higher dimensions, Newton's method uses the gradient and theHessian matrix of secondderivatives of the function to be minimized.

In quasi-Newton methods the Hessian matrix does not need to be computed. The Hessian is updated by analyzing successive gradient vectors instead. Quasi-Newton methods are a generalization of thesecant method to find the root of the first derivative for multidimensional problems. In multiple dimensions the secant equation isunder-determined, and quasi-Newton methods differ in how they constrain the solution, typically by adding a simple low-rank update to the current estimate of the Hessian.

The first quasi-Newton algorithm was proposed byWilliam C. Davidon, a physicist working atArgonne National Laboratory. He developed the first quasi-Newton algorithm in 1959: theDFP updating formula, which was later popularized by Fletcher and Powell in 1963, but is rarely used today. The most common quasi-Newton algorithms are currently theSR1 formula (for "symmetric rank-one"), theBHHH method, the widespreadBFGS method (suggested independently byBroyden,Fletcher,Goldfarb, andShanno, in 1970), and its low-memory extensionL-BFGS. The Broyden's class is a linear combination of the DFP and BFGS methods.

The SR1 formula does not guarantee the update matrix to maintainpositive-definiteness and can be used for indefinite problems. TheBroyden's method does not require the update matrix to be symmetric and is used to find the root of a general system of equations (rather than the gradient) by updating theJacobian (rather than the Hessian).

One of the chief advantages of quasi-Newton methods overNewton's method is that theHessian matrix (or, in the case of quasi-Newton methods, its approximation)B{\displaystyle B} does not need to be inverted. Newton's method, and its derivatives such asinterior point methods, require the Hessian to be inverted, which is typically implemented by solving asystem of linear equations and is often quite costly. In contrast, quasi-Newton methods usually generate an estimate ofB1{\displaystyle B^{-1}} directly.

As inNewton's method, one uses a second-order approximation to find the minimum of a functionf(x){\displaystyle f(x)}. TheTaylor series off(x){\displaystyle f(x)} around an iterate is

f(xk+Δx)f(xk)+f(xk)TΔx+12ΔxTBΔx,{\displaystyle f(x_{k}+\Delta x)\approx f(x_{k})+\nabla f(x_{k})^{\mathrm {T} }\,\Delta x+{\frac {1}{2}}\Delta x^{\mathrm {T} }B\,\Delta x,}

where (f{\displaystyle \nabla f}) is thegradient, andB{\displaystyle B} an approximation to theHessian matrix.[4] The gradient of this approximation (with respect toΔx{\displaystyle \Delta x}) is

f(xk+Δx)f(xk)+BΔx,{\displaystyle \nabla f(x_{k}+\Delta x)\approx \nabla f(x_{k})+B\,\Delta x,}

and setting this gradient to zero (which is the goal of optimization) provides the Newton step:

Δx=B1f(xk).{\displaystyle \Delta x=-B^{-1}\nabla f(x_{k}).}

The Hessian approximationB{\displaystyle B} is chosen to satisfy

f(xk+Δx)=f(xk)+BΔx,{\displaystyle \nabla f(x_{k}+\Delta x)=\nabla f(x_{k})+B\,\Delta x,}

which is called thesecant equation (the Taylor series of the gradient itself). In more than one dimensionB{\displaystyle B} isunderdetermined. In one dimension, solving forB{\displaystyle B} and applying the Newton's step with the updated value is equivalent to thesecant method. The various quasi-Newton methods differ in their choice of the solution to the secant equation (in one dimension, all the variants are equivalent). Most methods (but with exceptions, such asBroyden's method) seek a symmetric solution (BT=B{\displaystyle B^{T}=B}); furthermore, the variants listed below can be motivated by finding an updateBk+1{\displaystyle B_{k+1}} that is as close as possible toBk{\displaystyle B_{k}} in somenorm; that is,Bk+1=argminBBBkV{\displaystyle B_{k+1}=\operatorname {argmin} _{B}\|B-B_{k}\|_{V}}, whereV{\displaystyle V} is somepositive-definite matrix that defines the norm. An approximate initial valueB0=βI{\displaystyle B_{0}=\beta I} is often sufficient to achieve rapid convergence, although there is no general strategy to chooseβ{\displaystyle \beta }.[5] Note thatB0{\displaystyle B_{0}} should be positive-definite. The unknownxk{\displaystyle x_{k}} is updated applying the Newton's step calculated using the current approximate Hessian matrixBk{\displaystyle B_{k}}:

yk=f(xk+1)f(xk){\displaystyle y_{k}=\nabla f(x_{k+1})-\nabla f(x_{k})}

is used to update the approximate HessianBk+1{\displaystyle B_{k+1}}, or directly its inverseHk+1=Bk+11{\displaystyle H_{k+1}=B_{k+1}^{-1}} using theSherman–Morrison formula.

The most popular update formulas are:

MethodBk+1={\displaystyle \displaystyle B_{k+1}=}Hk+1=Bk+11={\displaystyle H_{k+1}=B_{k+1}^{-1}=}
BFGSBk+ykykTykTΔxkBkΔxk(BkΔxk)TΔxkTBkΔxk{\displaystyle B_{k}+{\frac {y_{k}y_{k}^{\mathrm {T} }}{y_{k}^{\mathrm {T} }\Delta x_{k}}}-{\frac {B_{k}\Delta x_{k}(B_{k}\Delta x_{k})^{\mathrm {T} }}{\Delta x_{k}^{\mathrm {T} }B_{k}\,\Delta x_{k}}}}(IΔxkykTykTΔxk)Hk(IykΔxkTykTΔxk)+ΔxkΔxkTykTΔxk{\displaystyle \left(I-{\frac {\Delta x_{k}y_{k}^{\mathrm {T} }}{y_{k}^{\mathrm {T} }\Delta x_{k}}}\right)H_{k}\left(I-{\frac {y_{k}\Delta x_{k}^{\mathrm {T} }}{y_{k}^{\mathrm {T} }\Delta x_{k}}}\right)+{\frac {\Delta x_{k}\Delta x_{k}^{\mathrm {T} }}{y_{k}^{\mathrm {T} }\,\Delta x_{k}}}}
BroydenBk+ykBkΔxkΔxkTΔxkΔxkT{\displaystyle B_{k}+{\frac {y_{k}-B_{k}\Delta x_{k}}{\Delta x_{k}^{\mathrm {T} }\,\Delta x_{k}}}\,\Delta x_{k}^{\mathrm {T} }}Hk+(ΔxkHkyk)ΔxkTHkΔxkTHkyk{\displaystyle H_{k}+{\frac {(\Delta x_{k}-H_{k}y_{k})\Delta x_{k}^{\mathrm {T} }H_{k}}{\Delta x_{k}^{\mathrm {T} }H_{k}\,y_{k}}}}
Broyden family(1φk)Bk+1BFGS+φkBk+1DFP,φ[0,1]{\displaystyle (1-\varphi _{k})B_{k+1}^{\text{BFGS}}+\varphi _{k}B_{k+1}^{\text{DFP}},\quad \varphi \in [0,1]}
DFP(IykΔxkTykTΔxk)Bk(IΔxkykTykTΔxk)+ykykTykTΔxk{\displaystyle \left(I-{\frac {y_{k}\,\Delta x_{k}^{\mathrm {T} }}{y_{k}^{\mathrm {T} }\,\Delta x_{k}}}\right)B_{k}\left(I-{\frac {\Delta x_{k}y_{k}^{\mathrm {T} }}{y_{k}^{\mathrm {T} }\,\Delta x_{k}}}\right)+{\frac {y_{k}y_{k}^{\mathrm {T} }}{y_{k}^{\mathrm {T} }\,\Delta x_{k}}}}Hk+ΔxkΔxkTΔxkTykHkykykTHkykTHkyk{\displaystyle H_{k}+{\frac {\Delta x_{k}\Delta x_{k}^{\mathrm {T} }}{\Delta x_{k}^{\mathrm {T} }\,y_{k}}}-{\frac {H_{k}y_{k}y_{k}^{\mathrm {T} }H_{k}}{y_{k}^{\mathrm {T} }H_{k}y_{k}}}}
SR1Bk+(ykBkΔxk)(ykBkΔxk)T(ykBkΔxk)TΔxk{\displaystyle B_{k}+{\frac {(y_{k}-B_{k}\,\Delta x_{k})(y_{k}-B_{k}\,\Delta x_{k})^{\mathrm {T} }}{(y_{k}-B_{k}\,\Delta x_{k})^{\mathrm {T} }\,\Delta x_{k}}}}Hk+(ΔxkHkyk)(ΔxkHkyk)T(ΔxkHkyk)Tyk{\displaystyle H_{k}+{\frac {(\Delta x_{k}-H_{k}y_{k})(\Delta x_{k}-H_{k}y_{k})^{\mathrm {T} }}{(\Delta x_{k}-H_{k}y_{k})^{\mathrm {T} }y_{k}}}}

Other methods are Pearson's method, McCormick's method, the Powell symmetric Broyden (PSB) method and Greenstadt's method.[2] These recursive low-rank matrix updates can also represented as an initial matrix plus a low-rank correction. This is theCompact quasi-Newton representation, which is particularly effective forconstrained and/or large problems.

Relationship to matrix inversion

[edit]

Whenf{\displaystyle f} is a convex quadratic function with positive-definite HessianB{\displaystyle B}, one would expect the matricesHk{\displaystyle H_{k}} generated by a quasi-Newton method to converge to the inverse HessianH=B1{\displaystyle H=B^{-1}}. This is indeed the case for the class of quasi-Newton methods based on least-change updates.[6]

Notable implementations

[edit]

Implementations of quasi-Newton methods are available in many programming languages.

Notable open source implementations include:

  • GNU Octave uses a form of BFGS in itsfsolve function, withtrust region extensions.
  • GNU Scientific Library implements the Broyden-Fletcher-Goldfarb-Shanno (BFGS) algorithm.
  • ALGLIB implements (L)BFGS in C++ and C#
  • R'soptim general-purpose optimizer routine uses theBFGS method by usingmethod="BFGS".[7]
  • Scipy.optimize has fmin_bfgs. In theSciPy extension toPython, thescipy.optimize.minimize function includes, among other methods, aBFGS implementation.[8]

Notable proprietary implementations include:

  • Mathematica includes quasi-Newton solvers.[9]
  • TheNAG Library contains several routines[10] for minimizing or maximizing a function[11] which use quasi-Newton algorithms.
  • In MATLAB'sOptimization Toolbox, thefminunc function uses (among other methods) theBFGS quasi-Newton method.[12] Many of the constrained methods of the Optimization toolbox useBFGS and the variantL-BFGS.[13]

See also

[edit]

References

[edit]
  1. ^Broyden, C. G. (1972). "Quasi-Newton Methods". In Murray, W. (ed.).Numerical Methods for Unconstrained Optimization. London: Academic Press. pp. 87–106.ISBN 0-12-512250-0.
  2. ^abcHaelterman, Rob (2009)."Analytical study of the Least Squares Quasi-Newton method for interaction problems".PhD Thesis, Ghent University. Retrieved2014-08-14.
  3. ^Rob Haelterman; Dirk Van Eester; Daan Verleyen (2015)."Accelerating the solution of a physics model inside a tokamak using the (Inverse) Column Updating Method".Journal of Computational and Applied Mathematics.279:133–144.doi:10.1016/j.cam.2014.11.005.
  4. ^"Introduction to Taylor's theorem for multivariable functions - Math Insight".mathinsight.org. RetrievedNovember 11, 2021.
  5. ^Nocedal, Jorge; Wright, Stephen J. (2006).Numerical Optimization. New York: Springer. pp. 142.ISBN 0-387-98793-2.
  6. ^Robert Mansel Gower; Peter Richtarik (2015). "Randomized Quasi-Newton Updates are Linearly Convergent Matrix Inversion Algorithms".arXiv:1602.01768 [math.NA].
  7. ^"optim function - RDocumentation".www.rdocumentation.org. Retrieved2022-02-21.
  8. ^"Scipy.optimize.minimize — SciPy v1.7.1 Manual".
  9. ^"Unconstrained Optimization: Methods for Local Minimization—Wolfram Language Documentation".reference.wolfram.com. Retrieved2022-02-21.
  10. ^The Numerical Algorithms Group."Keyword Index: Quasi-Newton".NAG Library Manual, Mark 23. Retrieved2012-02-09.
  11. ^The Numerical Algorithms Group."E04 – Minimizing or Maximizing a Function"(PDF).NAG Library Manual, Mark 23. Retrieved2012-02-09.
  12. ^"Find minimum of unconstrained multivariable function - MATLAB fminunc". Archived fromthe original on 2012-01-12. Retrieved2012-03-07.
  13. ^"Constrained Nonlinear Optimization Algorithms - MATLAB & Simulink".www.mathworks.com. Retrieved2022-02-21.

Further reading

[edit]
Concepts
Applications
Implementations
Audio–visual
Text
Decisional
People
Architectures
Functions
Gradients
Convergence
Quasi–Newton
Other methods
Hessians
Graph of a strictly concave quadratic function with unique maximum.
Optimization computes maxima and minima.
General
Differentiable
Convex
minimization
Linear and
quadratic
Interior point
Basis-exchange
Paradigms
Graph
algorithms
Minimum
spanning tree
Shortest path
Network flows
Retrieved from "https://en.wikipedia.org/w/index.php?title=Quasi-Newton_method&oldid=1301286663"
Category:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp