Movatterモバイル変換

Frank–Wolfe algorithm

From Wikipedia, the free encyclopedia

Optimization algorithm

TheFrank–Wolfe algorithm is aniterative first-order optimization algorithm forconstrained convex optimization. Also known as theconditional gradient method,^[1]reduced gradient algorithm and theconvex combination algorithm, the method was originally proposed byMarguerite Frank andPhilip Wolfe in 1956.^[2] In each iteration, the Frank–Wolfe algorithm considers alinear approximation of the objective function, and moves towards a minimizer of this linear function (taken over the same domain).

Problem statement

[edit]

Suppose ${\mathcal {D}}$ is acompact convex set in avector space and $f\colon {\mathcal {D}}\to \mathbb {R}$ is aconvex,differentiable real-valued function. The Frank–Wolfe algorithm solves theoptimization problem

Minimize

f(\mathbf {x} )

subject to

\mathbf {x} \in {\mathcal {D}}

Algorithm

[edit]

Initialization: Let

k\leftarrow 0

, and let

\mathbf {x} _{0}\!

be any point in

{\mathcal {D}}

Step 1.Direction-finding subproblem: Find

\mathbf {s} _{k}

solving

Minimize

\mathbf {s} ^{T}\nabla f(\mathbf {x} _{k})

Subject to

\mathbf {s} \in {\mathcal {D}}

(Interpretation: Minimize the linear approximation of the problem given by the first-orderTaylor approximation of $f {\displaystyle f}$ around $\mathbf {x} _{k}\!$ constrained to stay within ${\mathcal {D}}$ .)

Step 2.Step size determination: Set

\alpha \leftarrow {\frac {2}{k+2}}

, or alternatively find

\alpha

that minimizes

f(\mathbf {x} _{k}+\alpha (\mathbf {s} _{k}-\mathbf {x} _{k}))

subject to

0\leq \alpha \leq 1

Step 3.Update: Let

\mathbf {x} _{k+1}\leftarrow \mathbf {x} _{k}+\alpha (\mathbf {s} _{k}-\mathbf {x} _{k})

, let

k\leftarrow k+1

and go to Step 1.

Properties

[edit]

While competing methods such asgradient descent for constrained optimization require aprojection step back to the feasible set in each iteration, the Frank–Wolfe algorithm only needs the solution of a convex problem over the same set in each iteration, and automatically stays in the feasible set.

The convergence of the Frank–Wolfe algorithm is sublinear in general: the error in the objective function to the optimum is $O(1/k)$ afterk iterations, so long as the gradient isLipschitz continuous with respect to some norm. The same convergence rate can also be shown if the sub-problems are only solved approximately.^[3]

The iterations of the algorithm can always be represented as a sparse convex combination of the extreme points of the feasible set, which has helped to the popularity of the algorithm for sparse greedy optimization inmachine learning andsignal processing problems,^[4] as well as for example the optimization ofminimum–cost flows intransportation networks.^[5]

If the feasible set is given by a set of linear constraints, then the subproblem to be solved in each iteration becomes alinear program.

While the worst-case convergence rate with $O(1/k)$ can not be improved in general, faster convergence can be obtained for special problem classes, such as some strongly convex problems.^[6]

Lower bounds on the solution value, and primal-dual analysis

[edit]

Since $f {\displaystyle f}$ isconvex, for any two points $\mathbf {x} ,\mathbf {y} \in {\mathcal {D}}$ we have:

f(\mathbf {y} )\geq f(\mathbf {x} )+(\mathbf {y} -\mathbf {x} )^{T}\nabla f(\mathbf {x} )

This also holds for the (unknown) optimal solution $\mathbf {x} ^{*}$ . That is, $f(\mathbf {x} ^{*})\geq f(\mathbf {x} )+(\mathbf {x} ^{*}-\mathbf {x} )^{T}\nabla f(\mathbf {x} )$ . The best lower bound with respect to a given point $\mathbf {x}$ is given by

{\begin{aligned}f(\mathbf {x} ^{*})&\geq f(\mathbf {x} )+(\mathbf {x} ^{*}-\mathbf {x} )^{T}\nabla f(\mathbf {x} )\\&\geq \min _{\mathbf {y} \in D}\left\{f(\mathbf {x} )+(\mathbf {y} -\mathbf {x} )^{T}\nabla f(\mathbf {x} )\right\}\\&=f(\mathbf {x} )-\mathbf {x} ^{T}\nabla f(\mathbf {x} )+\min _{\mathbf {y} \in D}\mathbf {y} ^{T}\nabla f(\mathbf {x} )\end{aligned}}

The latter optimization problem is solved in every iteration of the Frank–Wolfe algorithm, therefore the solution $\mathbf {s} _{k}$ of the direction-finding subproblem of the $k {\displaystyle k}$ -th iteration can be used to determine increasing lower bounds $l_{k}$ during each iteration by setting $l_{0}=-\infty$ and

l_{k}:=\max(l_{k-1},f(\mathbf {x} _{k})+(\mathbf {s} _{k}-\mathbf {x} _{k})^{T}\nabla f(\mathbf {x} _{k}))

Such lower bounds on the unknown optimal value are important in practice because they can be used as a stopping criterion, and give an efficient certificate of the approximation quality in every iteration, since always $l_{k}\leq f(\mathbf {x} ^{*})\leq f(\mathbf {x} _{k})$ .

It has been shown that this correspondingduality gap, that is the difference between $f(\mathbf {x} _{k})$ and the lower bound $l_{k}$ , decreases with the same convergence rate, i.e. $f(\mathbf {x} _{k})-l_{k}=O(1/k).$

Notes

[edit]

^Levitin, E. S.; Polyak, B. T. (1966). "Constrained minimization methods".USSR Computational Mathematics and Mathematical Physics.6 (5): 1.doi:10.1016/0041-5553(66)90114-5.
^Frank, M.; Wolfe, P. (1956). "An algorithm for quadratic programming".Naval Research Logistics Quarterly.3 (1–2):95–110.doi:10.1002/nav.3800030109.
^Dunn, J. C.; Harshbarger, S. (1978)."Conditional gradient algorithms with open loop step size rules".Journal of Mathematical Analysis and Applications.62 (2): 432.doi:10.1016/0022-247X(78)90137-3.
^Clarkson, K. L. (2010). "Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm".ACM Transactions on Algorithms.6 (4):1–30.CiteSeerX 10.1.1.145.9299.doi:10.1145/1824777.1824783.
^Fukushima, M. (1984). "A modified Frank-Wolfe algorithm for solving the traffic assignment problem".Transportation Research Part B: Methodological.18 (2):169–177.doi:10.1016/0191-2615(84)90029-8.
^Bertsekas, Dimitri (1999).Nonlinear Programming. Athena Scientific. p. 215.ISBN 978-1-886529-00-7.

Bibliography

[edit]

Jaggi, Martin (2013)."Revisiting Frank–Wolfe: Projection-Free Sparse Convex Optimization".Journal of Machine Learning Research: Workshop and Conference Proceedings.28 (1):427–435. (Overview paper)
The Frank–Wolfe algorithm description
Nocedal, Jorge; Wright, Stephen J. (2006).Numerical Optimization (2nd ed.). Berlin, New York:Springer-Verlag.ISBN 978-0-387-30303-1..

External links

[edit]

https://conditional-gradients.org/: a survey of Frank–Wolfe algorithms.
Marguerite Frank giving a personal account of the history of the algorithm

Movatterモバイル変換

Problem statement

Algorithm

Properties

Lower bounds on the solution value, and primal-dual analysis

Notes

Bibliography

External links

See also