- Notifications
You must be signed in to change notification settings - Fork18.9k
Comments
Use CDI for GPU injection for AMD devices for --gpus#52048
Use CDI for GPU injection for AMD devices for --gpus#52048shiv-tyagi wants to merge 1 commit intomoby:masterfrom
Conversation
daemon/devices_amd_linux.go Outdated
| // Try to detect AMD GPU vendor via CDI cache if cdiCache is available | ||
| if cdiCache != nil { | ||
| vendor, err := discoverGPUVendorFromCDI(cdiCache) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
One thing about this approach ... this only checks whether the cache includes AMD cdi devices at the point where the daemon is reloaded. In contrast to the other projects where we have added this functionlity, the cache here is started withAutoRefresh enabled meaning that the CDI spec directories are watched for changes to ensure that specs for new devices are detected.
With that in mind, the drivers that one wants to register would have to determine the vendor from the cache for every--gpus request and not only once at startup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Yes, makes sense.
I have updated the logic to always discover vendor from CDI registry on the fly since the registry is auto refreshed. I have also verified it is working as expected by deleting the CDI files while the daemon is running and verifying that the vendor discovery fails in that case.
Thanks for the suggestion.
Signed-off-by: Shiv Tyagi <Shiv.Tyagi@amd.com>
f0f8245 tofaba468Compare
Uh oh!
There was an error while loading.Please reload this page.
Closes#49824
This PR enhances the functionality of the
--gpusoption for AMD GPUs by utilizing CDI (Container Device Interface) specs for device injection when available. It falls back to the existing vendor runtime-based injection if AMD CDI specs are not detected on the machine.Related PR:containerd/containerd#12839 (Similar implementation for
containerd/ctr)- What I did
Added support for CDI-based GPU device injection through
--gpusoption for AMD devices.- How I did it
Created a similar composite device driver like NVIDIA's which discovers if AMD's CDI specs are there on the system during registration and registers itself with appropriate updaters to handle the device request.
- How to verify it
make binary.dockerdinstance via./bundles/binary/dockerd.amd-ctk cdi generateto install the CDI specs on the host.docker run --rm --gpus all rocm/rocm-terminal rocm-smi.dockerdprocess.docker run --rm --runtime="amd" --gpus all rocm/rocm-terminal rocm-smi.AMD_VISIBLE_DEVICESset when CDI specs are not there to verify that the fallback is working correctly.I have also added unit tests for vendor discovery function.
- Human readable description for the release notes
- A picture of a cute animal (not mandatory but encouraged)