Distributed#
Distributed training is a model training paradigm that involvesspreading training workload across multiple worker nodes, thereforesignificantly improving the speed of training and model accuracy. Whiledistributed training can be used for any type of ML model training, itis most beneficial to use it for large models and compute demandingtasks as deep learning.
There are a few ways you can perform distributed training inPyTorch with each method having their advantages in certain use cases:
Read more about these options inDistributed Overview.
Learn DDP#
A step-by-step video series on how to get started withDistributedDataParallel and advance to more complex topics
Code Video
This tutorial provides a short and gentle intro to the PyTorchDistributedData Parallel.
Code
This tutorial describes the Join context manager anddemonstrates it’s use with DistributedData Parallel.
Code
Learn FSDP2#
This tutorial demonstrates how you can perform distributed trainingwith FSDP2 on a transformer model
Code
Learn Tensor Parallel (TP)#
This tutorial demonstrates how to train a large Transformer-like model across hundreds to thousands of GPUs using Tensor Parallel and Fully Sharded Data Parallel.
Code
Learn DeviceMesh#
In this tutorial you will learn aboutDeviceMeshand how it can help with distributed training.
Code
Learn RPC#
This tutorial demonstrates how to get started with RPC-based distributedtraining.
Code
This tutorial walks you through a simple example of implementing aparameter server using PyTorch’s Distributed RPC framework.
Code
In this tutorial you will build batch-processing RPC applicationswith the @rpc.functions.async_execution decorator.
Code
In this tutorial you will learn how to combine distributed dataparallelism with distributed model parallelism.
Code
Learn Monarch#
Learn how to use Monarch’s actor framework
Code
Custom Extensions#
In this tutorial you will learn to implement a customProcessGroupbackend and plug that into PyTorch distributed package usingcpp extensions.
Code