- Notifications
You must be signed in to change notification settings - Fork277
A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.
License
deepseek-ai/DualPipe
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
DualPipe is an innovative bidirectional pipeline parallelism algorithm introduced in theDeepSeek-V3 Technical Report. It achieves full overlap of forward and backward computation-communication phases, also reducing pipeline bubbles. For detailed information on computation-communication overlap, please refer to theprofile data.
Example DualPipe scheduling for 8 PP ranks and 20 micro-batches in two directions.The micro-batches in the reverse direction are symmetric to those in the forward direction, sowe omit their batch ID for illustration simplicity. Two cells enclosed by a shared black borderhave mutually overlapped computation and communication
DualPipeV is a concise V-shape schedule derived from DualPipe using a "cut-in-half" procedure, introduced by Sea AI Lab as "Cut-in-half" in theirblog post. Thanks to them for this efficient schedule!
Example DualPipeV scheduling for 4 PP ranks (8 PP stages) and 10 micro-batches.
Method | Bubble | Parameter Per Device | Activation Per Device | #Devices |
---|---|---|---|---|
1F1B | (PP-1)(𝐹+𝐵) | 1× | PP | PP |
ZB1P | (PP-1)(𝐹+𝐵-2𝑊) | 1× | PP | PP |
DualPipe | (PP/2-1)(𝐹&𝐵+𝐵-3𝑊) | 2× | PP+1 | PP |
DualPipeV | (PP/2-1)(𝐹&𝐵+𝐵-3𝑊) | 2× | PP+1 | PP/2 |
PP denotes the number of pp stages (even).𝐹 denotes the execution time of a forward chunk, 𝐵 denotes the execution time of afull backward chunk, 𝑊 denotes the execution time of a "backward for weights" chunk, and 𝐹&𝐵denotes the execution time of two mutually overlapped forward and backward chunks.
The usage is shown in the following example:
python examples/example_dualpipe.pypython examples/example_dualpipev.py
Note: For real-world applications, you will need to implement a customoverlapped_forward_backward
method tailored to your specific module.
- PyTorch 2.0 and above
DualPipe was created and developed by Jiashi Li and Chengqi Deng and Wenfeng Liang.
@misc{deepseekai2025deepseekv3technicalreport,title={DeepSeek-V3 Technical Report},author={DeepSeek-AI},year={2025},eprint={2412.19437},archivePrefix={arXiv},primaryClass={cs.CL},url={https://arxiv.org/abs/2412.19437}, }