[edit]
Memory and Communication Efficient Distributed Stochastic Optimization with Minibatch Prox
Jialei Wang, Weiran Wang, Nathan SrebroProceedings of the 2017 Conference on Learning Theory, PMLR 65:1882-1919, 2017.
Abstract
We present and analyze statistically optimal, communication and memory efficient distributed stochastic optimization algorithms with near-linear speedups (up to $\log$-factors). This improves over prior work which includes methods with near-linear speedups but polynomial communication requirements (accelerated minibatch SGD) and communication efficient methods which do not exhibit any runtime speedups over a naive single-machine approach. We first analyze a distributed SVRG variant as a distributed stochastic optimization method and show that it can achieve near-linear speedups with logarithmic rounds of communication, at the cost of high memory requirements. We then present a novel method, MB-DSVRG, which trades off memory for communication and still allows for optimization with communication which scales only logarithmically with the desired accuracy while also being memory efficient. MB-DSVRG is based on a minibatch prox procedure, solving a non-linearized subproblem on a minibatch at each iteration. We provide a novel analysis for this procedure which achieves the statistical optimal rate regardless of minibatch size and smoothness, and thus significantly improving on prior work.
Cite this Paper
BibTeX
@InProceedings{pmlr-v65-wang17a, title = {Memory and Communication Efficient Distributed Stochastic Optimization with Minibatch Prox}, author = {Wang, Jialei and Wang, Weiran and Srebro, Nathan}, booktitle = {Proceedings of the 2017 Conference on Learning Theory}, pages = {1882--1919}, year = {2017}, editor = {Kale, Satyen and Shamir, Ohad}, volume = {65}, series = {Proceedings of Machine Learning Research}, month = {07--10 Jul}, publisher = {PMLR}, pdf = {http://proceedings.mlr.press/v65/wang17a/wang17a.pdf}, url = {https://proceedings.mlr.press/v65/wang17a.html}, abstract = {We present and analyze statistically optimal, communication and memory efficient distributed stochastic optimization algorithms with near-linear speedups (up to $\log$-factors). This improves over prior work which includes methods with near-linear speedups but polynomial communication requirements (accelerated minibatch SGD) and communication efficient methods which do not exhibit any runtime speedups over a naive single-machine approach. We first analyze a distributed SVRG variant as a distributed stochastic optimization method and show that it can achieve near-linear speedups with logarithmic rounds of communication, at the cost of high memory requirements. We then present a novel method, MB-DSVRG, which trades off memory for communication and still allows for optimization with communication which scales only logarithmically with the desired accuracy while also being memory efficient. MB-DSVRG is based on a minibatch prox procedure, solving a non-linearized subproblem on a minibatch at each iteration. We provide a novel analysis for this procedure which achieves the statistical optimal rate regardless of minibatch size and smoothness, and thus significantly improving on prior work.}}
Endnote
%0 Conference Paper%T Memory and Communication Efficient Distributed Stochastic Optimization with Minibatch Prox%A Jialei Wang%A Weiran Wang%A Nathan Srebro%B Proceedings of the 2017 Conference on Learning Theory%C Proceedings of Machine Learning Research%D 2017%E Satyen Kale%E Ohad Shamir%F pmlr-v65-wang17a%I PMLR%P 1882--1919%U https://proceedings.mlr.press/v65/wang17a.html%V 65%X We present and analyze statistically optimal, communication and memory efficient distributed stochastic optimization algorithms with near-linear speedups (up to $\log$-factors). This improves over prior work which includes methods with near-linear speedups but polynomial communication requirements (accelerated minibatch SGD) and communication efficient methods which do not exhibit any runtime speedups over a naive single-machine approach. We first analyze a distributed SVRG variant as a distributed stochastic optimization method and show that it can achieve near-linear speedups with logarithmic rounds of communication, at the cost of high memory requirements. We then present a novel method, MB-DSVRG, which trades off memory for communication and still allows for optimization with communication which scales only logarithmically with the desired accuracy while also being memory efficient. MB-DSVRG is based on a minibatch prox procedure, solving a non-linearized subproblem on a minibatch at each iteration. We provide a novel analysis for this procedure which achieves the statistical optimal rate regardless of minibatch size and smoothness, and thus significantly improving on prior work.
APA
Wang, J., Wang, W. & Srebro, N.. (2017). Memory and Communication Efficient Distributed Stochastic Optimization with Minibatch Prox.Proceedings of the 2017 Conference on Learning Theory, inProceedings of Machine Learning Research 65:1882-1919 Available from https://proceedings.mlr.press/v65/wang17a.html.