Stephen P. Boyd Research Classes MOOC | Learning Convex Optimization Control PoliciesA. Agrawal, S. Barratt, S. Boyd, and B. Stellato (Authors listed in alphabetical order.) Proceedings of Machine Learning Research, 120:361–373, 2020. Many control policies used in various applications determine the input oraction by solving a convex optimization problem that depends on the currentstate and some parameters. Common examples of such convex optimization controlpolicies (COCPs) include the linear quadratic regulator (LQR), convex modelpredictive control (MPC), and convex control-Lyapunov or approximate dynamicprogramming (ADP) policies. These types of control policies are tuned byvarying the parameters in the optimization problem, such as the LQR weights, toobtain good performance, judged by application-specific metrics. Tuning isoften done by hand, or by simple methods such as a crude grid search. In thispaper we propose a method to automate this process, by adjusting the parametersusing an approximate gradient of the performance metric with respect to theparameters. Our method relies on recently developed methods that canefficiently evaluate the derivative of the solution of a convex optimizationproblem with respect to its parameters. We illustrate our method on severalexamples. Page generated 2025-05-21 23:19:18 PDT, byjemdoc. |