Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Multiple GPU Windows System#19866

Unanswered
ptrem asked this question inDDP / multi-GPU / multi-node
May 13, 2024· 1 comments· 2 replies
Discussion options

Hi,
I have a Workstation with two RTX A6000 GPUs and a Windows System 🙈
and I would like to use both GPUs with Lightning-AI.
It's possible to use just use one of the GPUs but i get the following error if I try to use both GPUs.

The training works fine for:

trainer = pl.Trainer(        accelerator="gpu",        devices=1,        max_epochs=epochs,        logger=wandb_logger)

But I can't use for exampledevces=2,devices=[0, 1] ordevices=-1.
In every case I get the following error:
RuntimeError: Distributed package doesn't have NCCL built in

Has anyone an idea how to run Lightning on Multiple GPUs on a Windows System?
Thanks

You must be logged in to vote

Replies: 1 comment 2 replies

Comment options

It's bit late, but i believe nccl backend for ddp training is not supported for windows.
This is how you should configure your trainer on a windows system, but i experienced a significant slowdown before, so do compare the speed with nccl on a linux machine.

from lightning.pytorch.strategies import DDPStrategyddp = DDPStrategy(process_group_backend="gloo")trainer = Trainer(strategy=ddp, accelerator="gpu", devices=2)
You must be logged in to vote
2 replies
@ptrem
Comment options

Thanks for your reply. I will try it out.

@davidgill97
Comment options

It's bit late, but i believe nccl backend for ddp training is not supported for windows. This is how you should configure your trainer on a windows system, but i experienced a significant slowdown before, so do compare the speed with nccl on a linux machine.

from lightning.pytorch.strategies import DDPStrategyddp = DDPStrategy(process_group_backend="gloo")trainer = Trainer(strategy=ddp, accelerator="gpu", devices=2)

I tried it but Trainer doesn't have parameter where you can set strategy==ddp.

I'm not sure if i understood that correctly. According to thedocumentation,Trainer indeed has a parameterstrategy.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Labels
None yet
2 participants
@ptrem@davidgill97

[8]ページ先頭

©2009-2025 Movatter.jp