Memory and Communication Efficient Distributed Stochastic Optimization with Minibatch Prox

Proceedings of the 2017 Conference on Learning Theory, PMLR 65:1882-1919, 2017.

Abstract

We present and analyze statistically optimal, communication and memory efficient distributed stochastic optimization algorithms with near-linear speedups (up to $\log$-factors). This improves over prior work which includes methods with near-linear speedups but polynomial communication requirements (accelerated minibatch SGD) and communication efficient methods which do not exhibit any runtime speedups over a naive single-machine approach. We first analyze a distributed SVRG variant as a distributed stochastic optimization method and show that it can achieve near-linear speedups with logarithmic rounds of communication, at the cost of high memory requirements. We then present a novel method, MB-DSVRG, which trades off memory for communication and still allows for optimization with communication which scales only logarithmically with the desired accuracy while also being memory efficient. MB-DSVRG is based on a minibatch prox procedure, solving a non-linearized subproblem on a minibatch at each iteration. We provide a novel analysis for this procedure which achieves the statistical optimal rate regardless of minibatch size and smoothness, and thus significantly improving on prior work.

Cite this Paper

BibTeX

@InProceedings{pmlr-v65-wang17a,  title =  {Memory and Communication Efficient Distributed Stochastic Optimization with Minibatch Prox},  author =  {Wang, Jialei and Wang, Weiran and Srebro, Nathan},  booktitle =  {Proceedings of the 2017 Conference on Learning Theory},  pages =  {1882--1919},  year =  {2017},  editor =  {Kale, Satyen and Shamir, Ohad},  volume =  {65},  series =  {Proceedings of Machine Learning Research},  month =  {07--10 Jul},  publisher =    {PMLR},  pdf =  {http://proceedings.mlr.press/v65/wang17a/wang17a.pdf},  url =  {https://proceedings.mlr.press/v65/wang17a.html},  abstract =  {We present and analyze statistically optimal, communication and memory efficient distributed stochastic optimization algorithms with near-linear speedups (up to $\log$-factors).  This improves over prior work which includes methods with near-linear speedups but polynomial communication requirements (accelerated minibatch SGD) and communication efficient methods which do not exhibit any runtime speedups over a naive single-machine approach.  We first analyze a distributed SVRG variant as a distributed stochastic optimization method and show that it can achieve near-linear speedups with logarithmic rounds of communication, at the cost of high memory requirements. We then present a novel method, MB-DSVRG, which trades off memory for communication and still allows for optimization with communication which scales only logarithmically with the desired accuracy while also being memory efficient.  MB-DSVRG is based on a minibatch prox procedure, solving a non-linearized subproblem on a minibatch at each iteration.  We provide a novel analysis for this procedure which achieves the statistical optimal rate regardless of minibatch size and smoothness, and thus significantly improving on prior work.}}

Endnote

%0 Conference Paper%T Memory and Communication Efficient Distributed Stochastic Optimization with Minibatch Prox%A Jialei Wang%A Weiran Wang%A Nathan Srebro%B Proceedings of the 2017 Conference on Learning Theory%C Proceedings of Machine Learning Research%D 2017%E Satyen Kale%E Ohad Shamir%F pmlr-v65-wang17a%I PMLR%P 1882--1919%U https://proceedings.mlr.press/v65/wang17a.html%V 65%X We present and analyze statistically optimal, communication and memory efficient distributed stochastic optimization algorithms with near-linear speedups (up to $\log$-factors).  This improves over prior work which includes methods with near-linear speedups but polynomial communication requirements (accelerated minibatch SGD) and communication efficient methods which do not exhibit any runtime speedups over a naive single-machine approach.  We first analyze a distributed SVRG variant as a distributed stochastic optimization method and show that it can achieve near-linear speedups with logarithmic rounds of communication, at the cost of high memory requirements. We then present a novel method, MB-DSVRG, which trades off memory for communication and still allows for optimization with communication which scales only logarithmically with the desired accuracy while also being memory efficient.  MB-DSVRG is based on a minibatch prox procedure, solving a non-linearized subproblem on a minibatch at each iteration.  We provide a novel analysis for this procedure which achieves the statistical optimal rate regardless of minibatch size and smoothness, and thus significantly improving on prior work.

APA

Wang, J., Wang, W. & Srebro, N.. (2017). Memory and Communication Efficient Distributed Stochastic Optimization with Minibatch Prox.Proceedings of the 2017 Conference on Learning Theory, inProceedings of Machine Learning Research 65:1882-1919 Available from https://proceedings.mlr.press/v65/wang17a.html.

Related Material

Download PDF

Movatterモバイル変換

Memory and Communication Efficient Distributed Stochastic Optimization with Minibatch Prox

Abstract

Cite this Paper

Related Material