JYWa/Overlap_Local_SGDPublic

NotificationsYou must be signed in to change notification settings
Fork12
Star33

Implementation of (overlap) local SGD in Pytorch

33 stars 12 forks Branches Tags Activity

You must be signed in to change notification settings

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
distoptim		distoptim
models		models
.gitignore		.gitignore
README.md		README.md
comm_helpers.py		comm_helpers.py
launch_baseline.sh		launch_baseline.sh
train_LocalSGD.py		train_LocalSGD.py
util_v4.py		util_v4.py

Repository files navigation

Overlap-Local-SGD

Code to reproduce the experiments reported in this paper:

Jianyu Wang, Hao Liang, Gauri Joshi, "Overlap Local-SGD: An Algorithmic Approach to Hide Communication Delays in Distributed SGD," ICASSP 2020.(arXiv)

This repo contains the implementations of the following algorithms:

Local SGDStich ICLR 2018,Yu et al. AAAI 2019,Wang and Joshi 2018
Overlap-Local-SGD (proposed in this paper)
Elastic Averaging SGDZhang et al. NeurIPS 2015
CoCoD-SGDShen et al. IJCAI 2019
Blockwise Model-update Filtering (BMUF)Chen and Huo ICASSP 2016, also equivalent toSlowMo-Local SGD.

Please cite this paper if you use this code for your research/projects.

Dependencies and Setup

The code runs on Python 3.5 with PyTorch 1.0.0 and torchvision 0.2.1.The non-blocking communication is implemented using Python threading package.

Training examples

We implement all the above mentioned algorithms as subclasses oftorch.optim.optimizer. A typical usage is shown as follows:

importdistoptim# Before training# define the optimizer# One can use: 1) LocalSGD (including BMUF); 2) OverlapLocalSGD;#              3) EASGD; 4) CoCoDSGD# tau is the number of local updates / communication periodoptimizer=distoptim.SELECTED_OPTIMIZER(tau)......# define model, criterion, logging, etc..# Start trainingforbatch_id, (data,label)inenumerate(data_loader):# same as serial trainingoutput=model(data)# forwardloss=criterion(output,label)loss.backward()# backwardoptimizer.step()# gradient stepoptimizer.zero_grad()# additional line to average local models at workers# communication happens after every tau iterations# optimizer has its own iteration counter insideoptimizer.average()

In addition, one need to initialize the process group as described in thisdocumentation. In our private cluster, each machine has one GPU.

# backend = gloo or nccl# rank: 0,1,2,3,...# size: number of workers# h0 is the host name of worker0, you need to change ittorch.distributed.init_process_group(backend=args.backend,init_method='tcp://h0:22000',rank=args.rank,world_size=args.size)

Citation

@article{wang2020overlap,title={Overlap Local-{SGD}: An Algorithmic Approach to Hide Communication Delays in Distributed {SGD}},author={Wang, Jianyu and Liang, Hao and Joshi, Gauri},journal={arXiv preprint arXiv:2002.09539},year={2020}}

About

Implementation of (overlap) local SGD in Pytorch

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Overlap-Local-SGD

Dependencies and Setup

Training examples

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

JYWa/Overlap_Local_SGD

Folders and files

Latest commit

History

Repository files navigation

Overlap-Local-SGD

Dependencies and Setup

Training examples

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages