Rate this Page

Distributed#

Distributed training is a model training paradigm that involvesspreading training workload across multiple worker nodes, thereforesignificantly improving the speed of training and model accuracy. Whiledistributed training can be used for any type of ML model training, itis most beneficial to use it for large models and compute demandingtasks as deep learning.

There are a few ways you can perform distributed training inPyTorch with each method having their advantages in certain use cases:

Read more about these options inDistributed Overview.

Learn DDP#

DDP Intro Video Tutorials

A step-by-step video series on how to get started withDistributedDataParallel and advance to more complex topics

Code Video

https://pytorch.org/tutorials/beginner/ddp_series_intro.html?utm_source=distr_landing&utm_medium=ddp_series_intro
Getting Started with Distributed Data Parallel

This tutorial provides a short and gentle intro to the PyTorchDistributedData Parallel.

Code

https://pytorch.org/tutorials/intermediate/ddp_tutorial.html?utm_source=distr_landing&utm_medium=intermediate_ddp_tutorial
Distributed Training with Uneven Inputs Usingthe Join Context Manager

This tutorial describes the Join context manager anddemonstrates it’s use with DistributedData Parallel.

Code

https://pytorch.org/tutorials/advanced/generic_join.html?utm_source=distr_landing&utm_medium=generic_join

Learn FSDP2#

Getting Started with FSDP2

This tutorial demonstrates how you can perform distributed trainingwith FSDP2 on a transformer model

Code

https://pytorch.org/tutorials/intermediate/FSDP_tutorial.html?utm_source=distr_landing&utm_medium=FSDP_getting_started

Learn Tensor Parallel (TP)#

Large Scale Transformer model training with Tensor Parallel (TP)

This tutorial demonstrates how to train a large Transformer-like model across hundreds to thousands of GPUs using Tensor Parallel and Fully Sharded Data Parallel.

Code

https://pytorch.org/tutorials/intermediate/TP_tutorial.html

Learn DeviceMesh#

Getting Started with DeviceMesh

In this tutorial you will learn aboutDeviceMeshand how it can help with distributed training.

Code

https://pytorch.org/tutorials/recipes/distributed_device_mesh.html?highlight=devicemesh

Learn RPC#

Getting Started with Distributed RPC Framework

This tutorial demonstrates how to get started with RPC-based distributedtraining.

Code

https://pytorch.org/tutorials/intermediate/rpc_tutorial.html?utm_source=distr_landing&utm_medium=rpc_getting_started
Implementing a Parameter Server Using Distributed RPC Framework

This tutorial walks you through a simple example of implementing aparameter server using PyTorch’s Distributed RPC framework.

Code

https://pytorch.org/tutorials/intermediate/rpc_param_server_tutorial.html?utm_source=distr_landing&utm_medium=rpc_param_server_tutorial
Implementing Batch RPC Processing Using Asynchronous Executions

In this tutorial you will build batch-processing RPC applicationswith the @rpc.functions.async_execution decorator.

Code

https://pytorch.org/tutorials/intermediate/rpc_async_execution.html?utm_source=distr_landing&utm_medium=rpc_async_execution
Combining Distributed DataParallel with Distributed RPC Framework

In this tutorial you will learn how to combine distributed dataparallelism with distributed model parallelism.

Code

https://pytorch.org/tutorials/advanced/rpc_ddp_tutorial.html?utm_source=distr_landing&utm_medium=rpc_plus_ddp

Learn Monarch#

Interactive Distributed Applications with Monarch

Learn how to use Monarch’s actor framework

Code

https://docs.pytorch.org/tutorials/intermediate/monarch_distributed_tutorial.html

Custom Extensions#

Customize Process Group Backends Using Cpp Extensions

In this tutorial you will learn to implement a customProcessGroupbackend and plug that into PyTorch distributed package usingcpp extensions.

Code

https://pytorch.org/tutorials/intermediate/process_group_cpp_extension_tutorial.html?utm_source=distr_landing&utm_medium=custom_extensions_cpp