NotificationsYou must be signed in to change notification settings
Fork26.3k
Star96k

[cuDNN][SDPA][Convolution] Expose cuDNN runtime version in CUDA hooks#167111

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Closed

eqy wants to merge2 commits intopytorch:mainfromeqy:cudnnruntimeversion

Closed

[cuDNN][SDPA][Convolution] Expose cuDNN runtime version in CUDA hooks#167111

eqy wants to merge2 commits intopytorch:mainfromeqy:cudnnruntimeversion

Conversation

Copy link

Collaborator

eqy commentedNov 5, 2025•
edited
Loading

cuDNN dispatching heuristics rely on versions checks but currently only that compile-time version is exposed, if we want to allow users toresolve#166643 on their end by updating their cuDNN version locally we need to check the runtime version rather than compile-time version.

cc@csarofeen @ptrblck @xwang233 @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168 @aditew01

eqy added2 commits

November 5, 2025 19:00

check in

fa39341

lint

caa7a77

eqy requested a review fromsyed-ahmed as acode owner

November 5, 2025 19:06

eqy added the module: cudnnRelated to torch.backends.cudnn, and CuDNN support label

Nov 5, 2025

eqy requested a review fromAidyn-A as acode owner

November 5, 2025 19:06

eqy added module: convolution

Problems related to convolutions (THNN, THCUNN, CuDNN)

open source release notes: cudnn module: sdpaAll things related to torch.nn.functional.scaled_dot_product_attentiion labels

Nov 5, 2025

Copy link

pytorch-botbot commentedNov 5, 2025•
edited
Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results athud.pytorch.org/pr/167111

📄 PreviewPython docs built from this PR
📄 PreviewC++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit thebot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commitcaa7a77 with merge base5c63946 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

trunk / linux-jammy-cuda12.8-py3.10-gcc11 / test (default, 5, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu) (gh) (similar failure)
test_decomp

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-botbot added the module: cpuCPU specific problem (e.g., perf, algorithm) label

Nov 5, 2025

Skylion007 reviewed

Nov 5, 2025

View reviewed changes

aten/src/ATen/cuda/detail/CUDAHooks.h

		longversionCUDART()constoverride;
		longversionCuDNN()constoverride;
		longversionRuntimeCuDNN()constoverride;
		longversionCuDNNFrontend()constoverride;

Copy link

Collaborator

Skylion007Nov 5, 2025•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Why does Runtime CUDNN frontend matter? It cannot be changed right? It's a compile time include header?

Copy link

CollaboratorAuthor

eqyNov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I sidecar'd this change in as we'll need it in the near future for SDPA issues that require a cuDNN frontend version to be available for gating. In theorysdp_utils.cpp could be able to access this but I'm not sure I want to include that directly.

Copy link

Collaborator

Skylion007Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Can the runtime version be different for cudNNFronteEnd or should it be constexpr?

Skylion007 approved these changes

Nov 5, 2025

View reviewed changes

Skylion007 reviewed

Nov 5, 2025

View reviewed changes

aten/src/ATen/Context.h

		staticboolhasCuDNN() {
		returndetail::getCUDAHooks().hasCuDNN();
		}
		staticlongversionCuDNN() {

Copy link

Collaborator

Skylion007Nov 5, 2025•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

If this is really compile time? Why no constexpr? Would enable if constexpr logic that would simplify critical code paths in CUDNN dispatch.

Copy link

CollaboratorAuthor

eqyNov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

yes see

pytorch/aten/src/ATen/cuda/detail/CUDAHooks.cpp

Line 348 in6c5db82

return CUDNN_VERSION;

other uses ofCUDNN_VERSION in the file are macros, etc.

Copy link

Collaborator

Skylion007Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yeah, if they are macros they should be propogated with constexpr then. :)

Copy link

Collaborator

Skylion007Nov 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yeah, CUDNN_FRONTNED has it's equivalent function as constexpr

eqy mentioned this pull request

Nov 5, 2025

4x performance regression for 3D convs with AMP on torch 2.9.0#166122

Open

Copy link

CollaboratorAuthor

eqy commentedNov 5, 2025

@Skylion007 are we building with C++20 only? not sure ifvirtual functions (as these are CUDAHooks) can beconstexpr

Copy link

CollaboratorAuthor

eqy commentedNov 6, 2025

@pytorchmergebot merge

pytorch-botbot added the ciflow/trunkTrigger trunk jobs on your pull request label

Nov 6, 2025

pytorchmergebot added the merging label

Nov 6, 2025

Copy link

Collaborator

pytorchmergebot commentedNov 6, 2025

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in thewiki.

Questions? Feedback? Please reach out to thePyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Copy link

Collaborator

Skylion007 commentedNov 6, 2025

@Skylion007 are we building with C++20 only? not sure ifvirtual functions (as these are CUDAHooks) can beconstexpr

Ah, wasn't aware of that limitation. Not yet, no. :(

Copy link

Collaborator

pytorchmergebot commentedNov 6, 2025

Merge failed

Reason: 1 jobs have failed, first few of them are:trunk / linux-jammy-cuda12.8-py3.10-gcc11 / test (default, 5, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu)

Details for Dev Infra team

Raised byworkflow job

pytorchmergebot removed the merging label

Nov 6, 2025

Copy link

CollaboratorAuthor

eqy commentedNov 7, 2025

@pytorchmergebot merge

pytorchmergebot added the merging label

Nov 7, 2025

Copy link

Collaborator

pytorchmergebot commentedNov 7, 2025

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in thewiki.

Questions? Feedback? Please reach out to thePyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot added the Merged label

Nov 7, 2025

pytorchmergebot closed this ine678450

Nov 7, 2025

pytorchmergebot removed the merging label

Nov 7, 2025

Copy link

CollaboratorAuthor

eqy commentedNov 7, 2025

@pytorchbot cherry-pick --onto release/2.9 --fixes "cuDNN conv3d performance workaround" -c regression

pytorchbot pushed a commit that referenced this pull request

Nov 7, 2025

[cuDNN][SDPA][Convolution] Expose cuDNN runtime version in CUDA hooks (…

bfce8dd

…#167111)cuDNN dispatching heuristics rely on versions checks but currently only that compile-time version is exposed, if we want to allow users toresolve#166643 on their end by updating their cuDNN version locally we need to check the runtime version rather than compile-time version.Pull Requestresolved:#167111Approved by:https://github.com/Skylion007(cherry picked from commite678450)

Copy link

Collaborator

pytorchbot commentedNov 7, 2025

Cherry picking#167111

The cherry pick PR is at#167327 and it is linked with issue cuDNN conv3d performance workaround. The following tracker issues are updated:

[v2.9.1] Release Tracker #166758 (comment)

Details for Dev Infra team

Raised byworkflow job

pytorchbot mentioned this pull request

Nov 7, 2025

[v2.9.1] Release Tracker#166758

Closed

atalman pushed a commit that referenced this pull request

Nov 7, 2025

[cuDNN][SDPA][Convolution] Expose cuDNN runtime version in CUDA hooks (…

5811a8d

…#167327)[cuDNN][SDPA][Convolution] Expose cuDNN runtime version in CUDA hooks (#167111)cuDNN dispatching heuristics rely on versions checks but currently only that compile-time version is exposed, if we want to allow users toresolve#166643 on their end by updating their cuDNN version locally we need to check the runtime version rather than compile-time version.Pull Requestresolved:#167111Approved by:https://github.com/Skylion007(cherry picked from commite678450)Co-authored-by: Eddie Yan <eddiey@nvidia.com>

jovan2009 referenced this pull request in comfyanonymous/ComfyUI

Nov 14, 2025

Pytorch is stupid. (#10398)

b4f30bd

jovan2009 mentioned this pull request

Nov 14, 2025

CUDNN version in nightly pytorch 2.10.0 builds#167242

Open

Silv3S pushed a commit to Silv3S/pytorch that referenced this pull request

Nov 18, 2025

[cuDNN][SDPA][Convolution] Expose cuDNN runtime version in CUDA hooks (…

dc3f467

…pytorch#167111)cuDNN dispatching heuristics rely on versions checks but currently only that compile-time version is exposed, if we want to allow users toresolvepytorch#166643 on their end by updating their cuDNN version locally we need to check the runtime version rather than compile-time version.Pull Requestresolved:pytorch#167111Approved by:https://github.com/Skylion007

jovan2009 mentioned this pull request

Nov 21, 2025

working around nvidia conv3d memory bugcomfyanonymous/ComfyUI#10827

Closed

1 task