- Notifications
You must be signed in to change notification settings - Fork233
fix: derive CUDA_CORE_BUILD_MAJOR from headers instead of bindings version#1395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Uh oh!
There was an error while loading.Please reload this page.
Conversation
Andy-Jost commentedDec 17, 2025
/ok to test0957f91 |
|
…rsionFixes build failures when cuda-bindings reports major version 13 butCUDA headers are version 12, causing missing enum errors forCU_MEM_LOCATION_TYPE_NONE and CU_MEM_ALLOCATION_TYPE_MANAGED.The new _get_cuda_core_build_major_version() function prioritizes:1. Explicit CUDA_CORE_BUILD_MAJOR env var (CI override)2. CUDA_VERSION from cuda.h headers (matches compile target)3. nvidia-smi driver-reported version (fallback)4. cuda-bindings major version (last resort)Adds tests for the version detection logic in test_build_hooks.py.
0957f91 toff5644aCompareAndy-Jost commentedDec 17, 2025
/ok to testff5644a |
kkraus14 commentedDec 17, 2025
Is this not a broken environment? |
Andy-Jost commentedDec 17, 2025
When creating an environment with % conda list cuda# packages in environment at /home/scratch.ajost_sw/miniforge3/envs/test:## Name Version Build Channelcuda-bindings 13.1.1 pypi_0 pypicuda-version 12.9 h4f385c5_3 conda-forge (As an aside, if I specify both packages up front with This setup shouldn’t inherently be a problem. Users generally expect that newer releases (like Anecdotally, this configuration has worked fine for me for months with no runtime instability, though it may not be explicitly supported. However, a recent change broke this workflow, requiring either Because So the case we want to support is:
The proposed fix ensures |
kkraus14 commentedDec 17, 2025
Unfortunately, the Python packaging ecosystem is a mess, but this is expected. Conda packages and pip packages are two entirely separate things that aren't necessarily equivalent or compatible with each other. In our case, conda packages can be used for packaging non-python code, i.e. for the CUDA Toolkit native libraries. The
How do we handle API breaking changes across major versions like 12.x and 13.x? The underlying CTK libraries only guarantee their API and ABI stability within a major version. If any API has a signature change from 12.x --> 13.x, which flavor of the API should we have for Python? Should we dynamically adjust our Python API at runtime based on the detected driver version available on the system? What if someone wants to specifically target the 12.x API and run on a 13.x+ driver? There's a lot of open questions here where the supported path for now is that the
The problem with this is that
The backward compatibility guarantees that CUDA makes and we follow are the following:
|
Andy-Jost commentedDec 17, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
@kkraus14 Thanks for the additional details. In my view, deriving I'd like to suggest the following:
WDYT Edit: For (2) please see#1412 |
Uh oh!
There was an error while loading.Please reload this page.
Summary
cuda-bindingsreports major version 13 but CUDA headers are version 12, causing missing enum errors forCU_MEM_LOCATION_TYPE_NONEandCU_MEM_ALLOCATION_TYPE_MANAGED_get_cuda_core_build_major_version()function prioritizes: env var override → CUDA headers → nvidia-smi → cuda-bindings fallbackTest plan
pytest tests/test_build_hooks.py -v --noconftest