Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

fix: derive CUDA_CORE_BUILD_MAJOR from headers instead of bindings version#1395

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
Andy-Jost wants to merge4 commits intoNVIDIA:main
base:main
Choose a base branch
Loading
fromAndy-Jost:build-major-from-headers

Conversation

@Andy-Jost
Copy link
Contributor

@Andy-JostAndy-Jost commentedDec 17, 2025
edited
Loading

Summary

  • Fixes build failures whencuda-bindings reports major version 13 but CUDA headers are version 12, causing missing enum errors forCU_MEM_LOCATION_TYPE_NONE andCU_MEM_ALLOCATION_TYPE_MANAGED
  • The new_get_cuda_core_build_major_version() function prioritizes: env var override → CUDA headers → nvidia-smi → cuda-bindings fallback
  • Adds unit tests for the version detection logic

Test plan

  • Unit tests pass:pytest tests/test_build_hooks.py -v --noconftest
  • CI tests pass
  • Manual verification: build succeeds with mismatched cuda-bindings 13.x and CUDA 12 headers

@Andy-JostAndy-Jost added bugSomething isn't working P0High priority - Must do! cuda.coreEverything related to the cuda.core module labelsDec 17, 2025
@Andy-JostAndy-Jost self-assigned thisDec 17, 2025
@copy-pr-bot
Copy link
Contributor

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilitieshere.

Contributors can view more details about this messagehere.

@Andy-Jost
Copy link
ContributorAuthor

/ok to test0957f91

copy-pr-bot[bot] reacted with thumbs up emoji

@Andy-JostAndy-Jost added enhancementAny code-related improvements P1Medium priority - Should do and removed bugSomething isn't working P0High priority - Must do! labelsDec 17, 2025
@Andy-JostAndy-Jost added this to thecuda.core beta 11 milestoneDec 17, 2025
@github-actions
Copy link

…rsionFixes build failures when cuda-bindings reports major version 13 butCUDA headers are version 12, causing missing enum errors forCU_MEM_LOCATION_TYPE_NONE and CU_MEM_ALLOCATION_TYPE_MANAGED.The new _get_cuda_core_build_major_version() function prioritizes:1. Explicit CUDA_CORE_BUILD_MAJOR env var (CI override)2. CUDA_VERSION from cuda.h headers (matches compile target)3. nvidia-smi driver-reported version (fallback)4. cuda-bindings major version (last resort)Adds tests for the version detection logic in test_build_hooks.py.
@Andy-Jost
Copy link
ContributorAuthor

/ok to testff5644a

copy-pr-bot[bot] reacted with thumbs up emoji

@kkraus14
Copy link
Collaborator

  • Fixes build failures whencuda-bindings reports major version 13 but CUDA headers are version 12, causing missing enum errors forCU_MEM_LOCATION_TYPE_NONE andCU_MEM_ALLOCATION_TYPE_MANAGED

Is this not a broken environment?cuda-bindings would presumably end up calling into v12.x DSOs which have a different abi than v13.x? What situation are we looking to support here?

@Andy-Jost
Copy link
ContributorAuthor

  • Fixes build failures whencuda-bindings reports major version 13 but CUDA headers are version 12, causing missing enum errors forCU_MEM_LOCATION_TYPE_NONE andCU_MEM_ALLOCATION_TYPE_MANAGED

Is this not a broken environment?cuda-bindings would presumably end up calling into v12.x DSOs which have a different abi than v13.x? What situation are we looking to support here?

When creating an environment withconda create -n test cuda-version=12 and then runningpip install cuda-bindings, I end up withcuda-bindings13.x:

% conda list cuda# packages in environment at /home/scratch.ajost_sw/miniforge3/envs/test:## Name                    Version                   Build  Channelcuda-bindings             13.1.1                   pypi_0    pypicuda-version              12.9                 h4f385c5_3    conda-forge

(As an aside, if I specify both packages up front withconda create -n test cuda-version=12 cuda-bindings I getcuda-bindings12.x instead. I wouldn’t have expected a difference between installing it during or after environment creation, but that’s what happens.)

This setup shouldn’t inherently be a problem. Users generally expect that newer releases (likecuda-bindings13.x) work with older CUDA toolkits due to backward compatibility guarantees. In practice,cuda-bindings should detect and adapt to the underlying CUDA 12 APIs.

Anecdotally, this configuration has worked fine for me for months with no runtime instability, though it may not be explicitly supported. However, a recent change broke this workflow, requiring eithercuda-bindings12.x or settingCUDA_CORE_BUILD_MAJOR=12 manually when buildingcuda-core.

Becausecuda-core discoverscuda.h relative toCUDA_HOME orCUDA_PATH, it doesn’t make sense to tieCUDA_CORE_BUILD_MAJOR to thecuda-bindings version. It’s more consistent to derive it from the version indicated by the headers.

So the case we want to support is:

  • The user has an older CUDA toolkit (e.g.12.x).
  • The user installs the latestcuda-bindings and expects it to work due to backward compatibility.

The proposed fix ensurescuda-core builds correctly in this situation by decoupling its build version logic from the installedcuda-bindings.

@kkraus14
Copy link
Collaborator

When creating an environment withconda create -n test cuda-version=12 and then runningpip install cuda-bindings, I end up withcuda-bindings13.x:

% conda list cuda# packages in environment at /home/scratch.ajost_sw/miniforge3/envs/test:## Name                    Version                   Build  Channelcuda-bindings             13.1.1                   pypi_0    pypicuda-version              12.9                 h4f385c5_3    conda-forge

(As an aside, if I specify both packages up front withconda create -n test cuda-version=12 cuda-bindings I getcuda-bindings12.x instead. I wouldn’t have expected a difference between installing it during or after environment creation, but that’s what happens.)

Unfortunately, the Python packaging ecosystem is a mess, but this is expected. Conda packages and pip packages are two entirely separate things that aren't necessarily equivalent or compatible with each other. In our case, conda packages can be used for packaging non-python code, i.e. for the CUDA Toolkit native libraries. Thecuda-version conda package has a constraint on the__cuda virtual conda package which detects the version of the toolkit that is compatible with the driver running on the system. Pip unfortunately doesn't have these capabilities (we are trying to change that withhttps://wheelnext.dev/) so there's no way to control the version ofcuda-bindings resolved from a pip install command based on the driver version.

This setup shouldn’t inherently be a problem. Users generally expect that newer releases (likecuda-bindings13.x) work with older CUDA toolkits due to backward compatibility guarantees. In practice,cuda-bindings should detect and adapt to the underlying CUDA 12 APIs.

How do we handle API breaking changes across major versions like 12.x and 13.x? The underlying CTK libraries only guarantee their API and ABI stability within a major version. If any API has a signature change from 12.x --> 13.x, which flavor of the API should we have for Python? Should we dynamically adjust our Python API at runtime based on the detected driver version available on the system? What if someone wants to specifically target the 12.x API and run on a 13.x+ driver? There's a lot of open questions here where the supported path for now is that thecuda-bindings package version follows the API and ABI of same major version of the CTK.

Anecdotally, this configuration has worked fine for me for months with no runtime instability, though it may not be explicitly supported. However, a recent change broke this workflow, requiring eithercuda-bindings12.x or settingCUDA_CORE_BUILD_MAJOR=12 manually when buildingcuda-core.

Becausecuda-core discoverscuda.h relative toCUDA_HOME orCUDA_PATH, it doesn’t make sense to tieCUDA_CORE_BUILD_MAJOR to thecuda-bindings version. It’s more consistent to derive it from the version indicated by the headers.

The problem with this is thatcuda-core uses thecuda-bindings Cython implementation within it. I.E. in your environment as described above, I imagine this would cause an issue:https://github.com/NVIDIA/cuda-python/blob/main/cuda_core/cuda/core/experimental/_device.pyx#L1097-L1100 since it's trying to use an externedcuDeviceGetUuid_v2 API fromcuda.h, which exists in CUDA 12.9, but doesn't exist as of CUDA 13.0 in eithercuda.h or incydriver.pxd.

Becausecuda-core discoverscuda.h relative toCUDA_HOME orCUDA_PATH, it doesn’t make sense to tieCUDA_CORE_BUILD_MAJOR to thecuda-bindings version. It’s more consistent to derive it from the version indicated by the headers.

So the case we want to support is:

  • The user has an older CUDA toolkit (e.g.12.x).
  • The user installs the latestcuda-bindings and expects it to work due to backward compatibility.

The proposed fix ensurescuda-core builds correctly in this situation by decoupling its build version logic from the installedcuda-bindings.

cuda-core only usescuda.h indirectly via thecuda-bindings Cython APIs, which extern APIs fromcuda.h and other CUDA headers. But again, as described above, we currently need to match thecuda-bindings andcuda.h (and other CUDA headers) major versions in order to match the APIs.

The backward compatibility guarantees that CUDA makes and we follow are the following:

  • For the driver library, API backward and forward compatibility within a major version
  • For the driver library, ABI backward compatibility forever and forward compatibility within a major version
    • We currently don't support ABI backward compatibility across major versions incuda.bindings driver modules today, but hope to in the future
  • For toolkit libraries, API backward and forward compatibility within a major version
  • For the toolkit libraries, ABI backward and forward compatibility within a major version

@Andy-Jost
Copy link
ContributorAuthor

Andy-Jost commentedDec 17, 2025
edited
Loading

@kkraus14 Thanks for the additional details. In my view, derivingCUDA_CORE_BUILD_MAJOR from the headers thatcuda-core actually compiles against is a strict improvement, since it allows previously failing environments to build without weakening the official guidance about matching major versions.

I'd like to suggest the following:

  1. We commit this change because it turns a hard build failure into a successful build likely producing a working configuration in an environment that users can realistically end up in.
  2. As a follow-on change, we add import-time checking to flag unsupported version combinations and issue an appropriate warning.

WDYT

Edit: For (2) please see#1412

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@leofangleofangAwaiting requested review from leofang

@rparolinrparolinAwaiting requested review from rparolin

@rwgkrwgkAwaiting requested review from rwgk

At least 1 approving review is required to merge this pull request.

Assignees

@Andy-JostAndy-Jost

Labels

cuda.coreEverything related to the cuda.core moduleenhancementAny code-related improvementsP1Medium priority - Should do

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

@Andy-Jost@kkraus14

[8]ページ先頭

©2009-2025 Movatter.jp