Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Nonlinear conjugate gradient method

From Wikipedia, the free encyclopedia
Concept in mathematics

Innumerical optimization, thenonlinear conjugate gradient method generalizes theconjugate gradient method tononlinear optimization. For a quadratic functionf(x){\displaystyle \displaystyle f(x)}

f(x)=Axb2,{\displaystyle \displaystyle f(x)=\|Ax-b\|^{2},}

the minimum off{\displaystyle f} is obtained when thegradient is 0:

xf=2AT(Axb)=0{\displaystyle \nabla _{x}f=2A^{T}(Ax-b)=0}.

Whereas linear conjugate gradient seeks a solution to the linear equationATAx=ATb{\displaystyle \displaystyle A^{T}Ax=A^{T}b}, the nonlinear conjugate gradient method is generally used to find thelocal minimum of a nonlinear function using itsgradientxf{\displaystyle \nabla _{x}f} alone. It works when the function is approximately quadratic near the minimum, which is the case when the function is twice differentiable at the minimum and the second derivative is non-singular there.

Given a functionf(x){\displaystyle \displaystyle f(x)} ofN{\displaystyle N} variables to minimize, its gradientxf{\displaystyle \nabla _{x}f} indicates the direction of maximum increase.One simply starts in the opposite (steepest descent) direction:

Δx0=xf(x0){\displaystyle \Delta x_{0}=-\nabla _{x}f(x_{0})}

with an adjustable step lengthα{\displaystyle \displaystyle \alpha } and performs aline search in this direction until it reaches the minimum off{\displaystyle \displaystyle f}:

α0:=argminαf(x0+αΔx0){\displaystyle \displaystyle \alpha _{0}:=\arg \min _{\alpha }f(x_{0}+\alpha \Delta x_{0})},
x1=x0+α0Δx0{\displaystyle \displaystyle x_{1}=x_{0}+\alpha _{0}\Delta x_{0}}

After this first iteration in the steepest directionΔx0{\displaystyle \displaystyle \Delta x_{0}}, the following steps constitute one iteration of moving along a subsequent conjugate directionsn{\displaystyle \displaystyle s_{n}}, wheres0=Δx0{\displaystyle \displaystyle s_{0}=\Delta x_{0}}:

  1. Calculate the steepest direction:Δxn=xf(xn){\displaystyle \Delta x_{n}=-\nabla _{x}f(x_{n})},
  2. Computeβn{\displaystyle \displaystyle \beta _{n}} according to one of the formulas below,
  3. Update the conjugate direction:sn=Δxn+βnsn1{\displaystyle \displaystyle s_{n}=\Delta x_{n}+\beta _{n}s_{n-1}}
  4. Perform a line search: optimizeαn=argminαf(xn+αsn){\displaystyle \displaystyle \alpha _{n}=\arg \min _{\alpha }f(x_{n}+\alpha s_{n})},
  5. Update the position:xn+1=xn+αnsn{\displaystyle \displaystyle x_{n+1}=x_{n}+\alpha _{n}s_{n}},

With a pure quadratic function the minimum is reached withinN iterations (excepting roundoff error), but a non-quadratic function will make slower progress. Subsequent search directions lose conjugacy requiring the search direction to be reset to the steepest descent direction at least everyN iterations, or sooner if progress stops. However, resetting every iteration turns the method intosteepest descent. The algorithm stops when it finds the minimum, determined when no progress is made after a direction reset (i.e. in the steepest descent direction), or when some tolerance criterion is reached.

Within a linear approximation, the parametersα{\displaystyle \displaystyle \alpha } andβ{\displaystyle \displaystyle \beta } are the same as in thelinear conjugate gradient method but have been obtained with line searches.The conjugate gradient method can follow narrow (ill-conditioned) valleys, where thesteepest descent method slows down and follows a criss-cross pattern.

Four of the best known formulas forβn{\displaystyle \displaystyle \beta _{n}} are named after their developers:

  • Fletcher–Reeves:[1]
βnFR=ΔxnTΔxnΔxn1TΔxn1.{\displaystyle \beta _{n}^{FR}={\frac {\Delta x_{n}^{T}\Delta x_{n}}{\Delta x_{n-1}^{T}\Delta x_{n-1}}}.}
  • Polak–Ribière:[2]
βnPR=ΔxnT(ΔxnΔxn1)Δxn1TΔxn1.{\displaystyle \beta _{n}^{PR}={\frac {\Delta x_{n}^{T}(\Delta x_{n}-\Delta x_{n-1})}{\Delta x_{n-1}^{T}\Delta x_{n-1}}}.}
  • Hestenes–Stiefel:[3]
βnHS=ΔxnT(ΔxnΔxn1)sn1T(ΔxnΔxn1).{\displaystyle \beta _{n}^{HS}={\frac {\Delta x_{n}^{T}(\Delta x_{n}-\Delta x_{n-1})}{-s_{n-1}^{T}(\Delta x_{n}-\Delta x_{n-1})}}.}
βnDY=ΔxnTΔxnsn1T(ΔxnΔxn1).{\displaystyle \beta _{n}^{DY}={\frac {\Delta x_{n}^{T}\Delta x_{n}}{-s_{n-1}^{T}(\Delta x_{n}-\Delta x_{n-1})}}.}.

These formulas are equivalent for a quadratic function, but for nonlinear optimization the preferred formula is a matter of heuristics or taste. A popular choice isβ=max{0,βPR}{\displaystyle \displaystyle \beta =\max\{0,\beta ^{PR}\}}, which provides a direction reset automatically.[5]

Algorithms based onNewton's method potentially converge much faster. There, both step direction and length are computed from the gradient as the solution of a linear system of equations, with the coefficient matrix being the exactHessian matrix (for Newton's method proper) or an estimate thereof (in thequasi-Newton methods, where the observed change in the gradient during the iterations is used to update the Hessian estimate). For high-dimensional problems, the exact computation of the Hessian is usually prohibitively expensive, and even its storage can be problematic, requiringO(N2){\displaystyle O(N^{2})} memory (but see the limited-memoryL-BFGS quasi-Newton method).

The conjugate gradient method can also be derived usingoptimal control theory.[6] In this accelerated optimization theory, the conjugate gradient method falls out as a nonlinearoptimal feedback controller,

u=k(x,x˙):=γaxf(x)γbx˙{\displaystyle u=k(x,{\dot {x}}):=-\gamma _{a}\nabla _{x}f(x)-\gamma _{b}{\dot {x}}}

for thedouble integrator system,

x¨=u{\displaystyle {\ddot {x}}=u}

The quantitiesγa>0{\displaystyle \gamma _{a}>0} andγb>0{\displaystyle \gamma _{b}>0} are variable feedback gains.[6]

See also

[edit]

References

[edit]
  1. ^Fletcher, R.; Reeves, C. M. (1964)."Function minimization by conjugate gradients".The Computer Journal.7 (2):149–154.doi:10.1093/comjnl/7.2.149.
  2. ^Polak, E.; Ribière, G. (1969). "Note sur la convergence de méthodes de directions conjuguées".Revue Française d'Automatique, Informatique, Recherche Opérationnelle.3 (1):35–43.
  3. ^Hestenes, M. R.; Stiefel, E. (1952)."Methods of Conjugate Gradients for Solving Linear Systems".Journal of Research of the National Bureau of Standards.49 (6):409–436.doi:10.6028/jres.049.044.
  4. ^Dai, Y.-H.; Yuan, Y. (1999). "A nonlinear conjugate gradient method with a strong global convergence property".SIAM Journal on Optimization.10 (1):177–182.doi:10.1137/S1052623497318992.
  5. ^Shewchuk, J. R. (August 1994)."An Introduction to the Conjugate Gradient Method Without the Agonizing Pain"(PDF).
  6. ^abRoss, I. M. (2019). "An Optimal Control Theory for Accelerated Optimization".arXiv:1902.09004 [math.OC].
Functions
Gradients
Convergence
Quasi–Newton
Other methods
Hessians
Graph of a strictly concave quadratic function with unique maximum.
Optimization computes maxima and minima.
General
Differentiable
Convex
minimization
Linear and
quadratic
Interior point
Basis-exchange
Paradigms
Graph
algorithms
Minimum
spanning tree
Shortest path
Network flows
Retrieved from "https://en.wikipedia.org/w/index.php?title=Nonlinear_conjugate_gradient_method&oldid=1287623328"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2026 Movatter.jp