Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.

License

NotificationsYou must be signed in to change notification settings

deepseek-ai/DualPipe

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DualPipe is an innovative bidirectional pipeline parallelism algorithm introduced in theDeepSeek-V3 Technical Report. It achieves full overlap of forward and backward computation-communication phases, also reducing pipeline bubbles. For detailed information on computation-communication overlap, please refer to theprofile data.

Schedules

dualpipe

Example DualPipe scheduling for 8 PP ranks and 20 micro-batches in two directions.The micro-batches in the reverse direction are symmetric to those in the forward direction, sowe omit their batch ID for illustration simplicity. Two cells enclosed by a shared black borderhave mutually overlapped computation and communication

DualPipeV

DualPipeV is a concise V-shape schedule derived from DualPipe using a "cut-in-half" procedure, introduced by Sea AI Lab as "Cut-in-half" in theirblog post. Thanks to them for this efficient schedule!

Schedules

dualpipev

Example DualPipeV scheduling for 4 PP ranks (8 PP stages) and 10 micro-batches.

Pipeline Bubbles and Memory Usage Comparison (based on the same number of PP stages)

MethodBubbleParameter Per DeviceActivation Per Device#Devices
1F1B(PP-1)(𝐹+𝐵)PPPP
ZB1P(PP-1)(𝐹+𝐵-2𝑊)PPPP
DualPipe(PP/2-1)(𝐹&𝐵+𝐵-3𝑊)PP+1PP
DualPipeV(PP/2-1)(𝐹&𝐵+𝐵-3𝑊)PP+1PP/2

PP denotes the number of pp stages (even).𝐹 denotes the execution time of a forward chunk, 𝐵 denotes the execution time of afull backward chunk, 𝑊 denotes the execution time of a "backward for weights" chunk, and 𝐹&𝐵denotes the execution time of two mutually overlapped forward and backward chunks.

Quick Start

The usage is shown in the following example:

python examples/example_dualpipe.pypython examples/example_dualpipev.py

Note: For real-world applications, you will need to implement a customoverlapped_forward_backward method tailored to your specific module.

Requirements

  • PyTorch 2.0 and above

Developers

DualPipe was created and developed by Jiashi Li and Chengqi Deng and Wenfeng Liang.

Citation

@misc{deepseekai2025deepseekv3technicalreport,title={DeepSeek-V3 Technical Report},author={DeepSeek-AI},year={2025},eprint={2412.19437},archivePrefix={arXiv},primaryClass={cs.CL},url={https://arxiv.org/abs/2412.19437}, }

About

A bidirectional pipeline parallelism algorithm for computation-communication overlap in V3/R1 training.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp