Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Frank–Wolfe algorithm

From Wikipedia, the free encyclopedia
Optimization algorithm

TheFrank–Wolfe algorithm is aniterativefirst-orderoptimizationalgorithm forconstrainedconvex optimization. Also known as theconditional gradient method,[1]reduced gradient algorithm and theconvex combination algorithm, the method was originally proposed byMarguerite Frank andPhilip Wolfe in 1956.[2] In each iteration, the Frank–Wolfe algorithm considers alinear approximation of the objective function, and moves towards a minimizer of this linear function (taken over the same domain).

Problem statement

[edit]

SupposeD{\displaystyle {\mathcal {D}}} is acompactconvex set in avector space andf:DR{\displaystyle f\colon {\mathcal {D}}\to \mathbb {R} } is aconvex,differentiablereal-valued function. The Frank–Wolfe algorithm solves theoptimization problem

Minimizef(x){\displaystyle f(\mathbf {x} )}
subject toxD{\displaystyle \mathbf {x} \in {\mathcal {D}}}.

Algorithm

[edit]
A step of the Frank–Wolfe algorithm
Initialization: Letk0{\displaystyle k\leftarrow 0}, and letx0{\displaystyle \mathbf {x} _{0}\!} be any point inD{\displaystyle {\mathcal {D}}}.
Step 1.Direction-finding subproblem: Findsk{\displaystyle \mathbf {s} _{k}} solving
MinimizesTf(xk){\displaystyle \mathbf {s} ^{T}\nabla f(\mathbf {x} _{k})}
Subject tosD{\displaystyle \mathbf {s} \in {\mathcal {D}}}
(Interpretation: Minimize the linear approximation of the problem given by the first-orderTaylor approximation off{\displaystyle f} aroundxk{\displaystyle \mathbf {x} _{k}\!} constrained to stay withinD{\displaystyle {\mathcal {D}}}.)
Step 2.Step size determination: Setα2k+2{\displaystyle \alpha \leftarrow {\frac {2}{k+2}}}, or alternatively findα{\displaystyle \alpha } that minimizesf(xk+α(skxk)){\displaystyle f(\mathbf {x} _{k}+\alpha (\mathbf {s} _{k}-\mathbf {x} _{k}))} subject to0α1{\displaystyle 0\leq \alpha \leq 1} .
Step 3.Update: Letxk+1xk+α(skxk){\displaystyle \mathbf {x} _{k+1}\leftarrow \mathbf {x} _{k}+\alpha (\mathbf {s} _{k}-\mathbf {x} _{k})}, letkk+1{\displaystyle k\leftarrow k+1} and go to Step 1.

Properties

[edit]

While competing methods such asgradient descent for constrained optimization require aprojection step back to the feasible set in each iteration, the Frank–Wolfe algorithm only needs the solution of a convex problem over the same set in each iteration, and automatically stays in the feasible set.

The convergence of the Frank–Wolfe algorithm is sublinear in general: the error in the objective function to the optimum isO(1/k){\displaystyle O(1/k)} afterk iterations, so long as the gradient isLipschitz continuous with respect to some norm. The same convergence rate can also be shown if the sub-problems are only solved approximately.[3]

The iterations of the algorithm can always be represented as a sparse convex combination of the extreme points of the feasible set, which has helped to the popularity of the algorithm for sparse greedy optimization inmachine learning andsignal processing problems,[4] as well as for example the optimization ofminimum–cost flows intransportation networks.[5]

If the feasible set is given by a set of linear constraints, then the subproblem to be solved in each iteration becomes alinear program.

While the worst-case convergence rate withO(1/k){\displaystyle O(1/k)} can not be improved in general, faster convergence can be obtained for special problem classes, such as some strongly convex problems.[6]

Lower bounds on the solution value, and primal-dual analysis

[edit]

Sincef{\displaystyle f} isconvex, for any two pointsx,yD{\displaystyle \mathbf {x} ,\mathbf {y} \in {\mathcal {D}}} we have:

f(y)f(x)+(yx)Tf(x){\displaystyle f(\mathbf {y} )\geq f(\mathbf {x} )+(\mathbf {y} -\mathbf {x} )^{T}\nabla f(\mathbf {x} )}

This also holds for the (unknown) optimal solutionx{\displaystyle \mathbf {x} ^{*}}. That is,f(x)f(x)+(xx)Tf(x){\displaystyle f(\mathbf {x} ^{*})\geq f(\mathbf {x} )+(\mathbf {x} ^{*}-\mathbf {x} )^{T}\nabla f(\mathbf {x} )}. The best lower bound with respect to a given pointx{\displaystyle \mathbf {x} } is given by

f(x)f(x)+(xx)Tf(x)minyD{f(x)+(yx)Tf(x)}=f(x)xTf(x)+minyDyTf(x){\displaystyle {\begin{aligned}f(\mathbf {x} ^{*})&\geq f(\mathbf {x} )+(\mathbf {x} ^{*}-\mathbf {x} )^{T}\nabla f(\mathbf {x} )\\&\geq \min _{\mathbf {y} \in D}\left\{f(\mathbf {x} )+(\mathbf {y} -\mathbf {x} )^{T}\nabla f(\mathbf {x} )\right\}\\&=f(\mathbf {x} )-\mathbf {x} ^{T}\nabla f(\mathbf {x} )+\min _{\mathbf {y} \in D}\mathbf {y} ^{T}\nabla f(\mathbf {x} )\end{aligned}}}

The latter optimization problem is solved in every iteration of the Frank–Wolfe algorithm, therefore the solutionsk{\displaystyle \mathbf {s} _{k}} of the direction-finding subproblem of thek{\displaystyle k}-th iteration can be used to determine increasing lower boundslk{\displaystyle l_{k}} during each iteration by settingl0={\displaystyle l_{0}=-\infty } and

lk:=max(lk1,f(xk)+(skxk)Tf(xk)){\displaystyle l_{k}:=\max(l_{k-1},f(\mathbf {x} _{k})+(\mathbf {s} _{k}-\mathbf {x} _{k})^{T}\nabla f(\mathbf {x} _{k}))}

Such lower bounds on the unknown optimal value are important in practice because they can be used as a stopping criterion, and give an efficient certificate of the approximation quality in every iteration, since alwayslkf(x)f(xk){\displaystyle l_{k}\leq f(\mathbf {x} ^{*})\leq f(\mathbf {x} _{k})}.

It has been shown that this correspondingduality gap, that is the difference betweenf(xk){\displaystyle f(\mathbf {x} _{k})} and the lower boundlk{\displaystyle l_{k}}, decreases with the same convergence rate, i.e.f(xk)lk=O(1/k).{\displaystyle f(\mathbf {x} _{k})-l_{k}=O(1/k).}

Notes

[edit]
  1. ^Levitin, E. S.; Polyak, B. T. (1966). "Constrained minimization methods".USSR Computational Mathematics and Mathematical Physics.6 (5): 1.doi:10.1016/0041-5553(66)90114-5.
  2. ^Frank, M.; Wolfe, P. (1956). "An algorithm for quadratic programming".Naval Research Logistics Quarterly.3 (1–2):95–110.doi:10.1002/nav.3800030109.
  3. ^Dunn, J. C.; Harshbarger, S. (1978)."Conditional gradient algorithms with open loop step size rules".Journal of Mathematical Analysis and Applications.62 (2): 432.doi:10.1016/0022-247X(78)90137-3.
  4. ^Clarkson, K. L. (2010). "Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm".ACM Transactions on Algorithms.6 (4):1–30.CiteSeerX 10.1.1.145.9299.doi:10.1145/1824777.1824783.
  5. ^Fukushima, M. (1984). "A modified Frank-Wolfe algorithm for solving the traffic assignment problem".Transportation Research Part B: Methodological.18 (2):169–177.doi:10.1016/0191-2615(84)90029-8.
  6. ^Bertsekas, Dimitri (1999).Nonlinear Programming. Athena Scientific. p. 215.ISBN 978-1-886529-00-7.

Bibliography

[edit]

External links

[edit]

See also

[edit]
Functions
Gradients
Convergence
Quasi–Newton
Other methods
Hessians
Graph of a strictly concave quadratic function with unique maximum.
Optimization computes maxima and minima.
General
Differentiable
Convex
minimization
Linear and
quadratic
Interior point
Basis-exchange
Paradigms
Graph
algorithms
Minimum
spanning tree
Shortest path
Network flows
Retrieved from "https://en.wikipedia.org/w/index.php?title=Frank–Wolfe_algorithm&oldid=1233948547"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp