Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Error in TP: RuntimeError: get_group_info: no group info associated with the group name #228

Open
@zy-ning

Description

@zy-ning

When I runENABLE_INTRA_NODE_COMM=1 torchrun --standalone --nproc_per_node=2 generate.py --compile --checkpoint_path checkpoints/$MODEL_REPO/model.pth, it ends up with error:RuntimeError: get_group_info: no group info associated with the group name.

Detailed error information:

W0609 20:18:37.249000 1431440 torch/distributed/run.py:766] *****************************************W0609 20:18:37.249000 1431440 torch/distributed/run.py:766] Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.W0609 20:18:37.249000 1431440 torch/distributed/run.py:766] *****************************************Using device=cudaLoading model ...Applying tensor parallel to model ...Time to load model: 10.60 seconds/root/serve/gpt-fast/tp.py:139: FutureWarning: The combination of ranks + tag as process group identifier has been deprecated. Please switch to using ProcessGroup, DeviceMesh, or group name instead.  attn.register_forward_hook(lambda _module, _input, output: funcol.all_reduce([rank0]: Traceback (most recent call last):[rank0]:   File "/root/serve/gpt-fast/generate.py", line 480, in <module>[rank0]:     main([rank0]:   File "/root/serve/gpt-fast/generate.py", line 401, in main[rank0]:     y, metrics = generate([rank0]:   File "/root/serve/gpt-fast/.venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context[rank0]:     return func(*args, **kwargs)[rank0]:   File "/root/serve/gpt-fast/generate.py", line 194, in generate[rank0]:     next_token = prefill(model, prompt.view(batch_size, -1), input_pos, **sampling_kwargs).clone()[rank0]:   File "/root/serve/gpt-fast/generate.py", line 71, in prefill[rank0]:     logits = model(mask, x, input_pos)[rank0]:   File "/root/serve/gpt-fast/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl[rank0]:     return self._call_impl(*args, **kwargs)[rank0]:   File "/root/serve/gpt-fast/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl[rank0]:     return forward_call(*args, **kwargs)[rank0]:   File "/root/serve/gpt-fast/model.py", line 156, in forward[rank0]:     x = layer(x, input_pos, freqs_cis, mask)[rank0]:   File "/root/serve/gpt-fast/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl[rank0]:     return self._call_impl(*args, **kwargs)[rank0]:   File "/root/serve/gpt-fast/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1762, in _call_impl[rank0]:     return forward_call(*args, **kwargs)[rank0]:   File "/root/serve/gpt-fast/model.py", line 175, in forward[rank0]:     h = x + self.attention(self.attention_norm(x), freqs_cis, mask, input_pos)[rank0]:   File "/root/serve/gpt-fast/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1751, in _wrapped_call_impl[rank0]:     return self._call_impl(*args, **kwargs)[rank0]:   File "/root/serve/gpt-fast/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1857, in _call_impl[rank0]:     return inner()[rank0]:   File "/root/serve/gpt-fast/.venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1818, in inner[rank0]:     hook_result = hook(self, args, result)[rank0]:   File "/root/serve/gpt-fast/tp.py", line 139, in <lambda>[rank0]:     attn.register_forward_hook(lambda _module, _input, output: funcol.all_reduce([rank0]:   File "/root/serve/gpt-fast/.venv/lib/python3.10/site-packages/torch/distributed/_functional_collectives.py", line 176, in all_reduce[rank0]:     tensor = torch.ops._c10d_functional.all_reduce(self, reduceOp.lower(), group_name)[rank0]:   File "/root/serve/gpt-fast/.venv/lib/python3.10/site-packages/torch/_ops.py", line 1158, in __call__[rank0]:     return self._op(*args, **(kwargs or {}))[rank0]: RuntimeError: get_group_info: no group info associated with the group name

In UV venv,
torch version: 2.7.1+cu126

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions


      [8]ページ先頭

      ©2009-2025 Movatter.jp