Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Must the sp_size be equal to the total_gpus in UlyssesSPAttentionHF?#7671

Unanswered
NiuMa-1234 asked this question inQ&A
Discussion options

I found thesequence_parallel_size in the provided example of Ulysses (test_ulysses_sp_hf.py) is equal to the world_size( total gpus) , and if thesequence_parallel_size is less than the world_size, the training would encounter an error when going backwards, as shown below:

 self.loss_scaler.backward(loss.float(), retain_graph=retain_graph)  File "/opt/conda/lib/python3.10/site-packages/deepspeed/runtime/fp16/loss_scaler.py", line 65, in backward    scaled_loss.backward(retain_graph=retain_graph)  File "/opt/conda/lib/python3.10/site-packages/torch/_tensor.py", line 522, in backward    torch.autograd.backward(  File "/opt/conda/lib/python3.10/site-packages/torch/autograd/__init__.py", line 266, in backward    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass  File "/opt/conda/lib/python3.10/site-packages/torch/autograd/function.py", line 289, in apply    return user_fn(self, *args)  File "/opt/conda/lib/python3.10/site-packages/torch/distributed/nn/functional.py", line 343, in backward    gx = torch.empty_like(grad_outputs[rank])IndexError: tuple index out of range

And the error is likely caused by that, when executingtorch._AllGather, thegradoutput only hassp_world_size items buttorch.distributed.get_rank() causes each GPU to choose its own grad from the gradoutput. Therefore raise this index mismatch error.

So is there a must to make sure the sequence_parallel_size be equal to the world_size?

You must be logged in to vote

Replies: 0 comments

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Category
Q&A
Labels
None yet
1 participant
@NiuMa-1234

[8]ページ先頭

©2009-2026 Movatter.jp