Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork56.4k
Initial support Blackwell GPU arch#26820
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation
10.0 blackwell b100/b20012.0 blackwell rtx50
asmorkalov commentedJan 22, 2025
cudawarped commentedJan 22, 2025
@johnnynunez Which version of the CUDA toolkit are you using to compile for Blackwell. The latest version I have access to (12.6 Update 3) does not support compute capability 10. I assume Nvidia will release a version 13.0 to coincide with the release of the first Blackwell cards. Isn't Blackwell compute capability 10.0, where does 12.0 come from? |
johnnynunez commentedJan 22, 2025
Hello, |
johnnynunez commentedJan 22, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Hello, also thor is 10.1 capability. |
cudawarped commentedJan 22, 2025
@johnnynunez So you haven't tested that this works on Blackwell 🤯? I would wait until a version of the CUDA toolkit is release which supports compute capability 10.0 to be 100% sure your change don't break anything. I would also remove compute capability 12.0. Note: If you add 10.1 for Thor you also want to update this filter opencv/cmake/OpenCVDetectCUDAUtils.cmake Line 274 inea023b7
|
johnnynunez commentedJan 22, 2025
But why you remove 12.0? |
johnnynunez commentedJan 22, 2025
Well, today NDA is removed |
johnnynunez commentedJan 22, 2025
I added support on pytorch, xla etc |
cudawarped commentedJan 22, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Can you provide me with a source to indicate that your 5090 will be compute capability 12.0?
Pytorch merged compute capability 10.0 and 12.0 without any build testing (I guess its a python first library)? Not sure I can see a reason for not waiting especially in OpenCV when you can manually select the compute cabability using combinations ofCUDA_ARCH_BIN andCUDA_ARCH_PTX. |
johnnynunez commentedJan 22, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
more references: |
cudawarped commentedJan 22, 2025
@johnnynunez 🤯 Nvidia is going up 3 compute capabilities with a single generation, that's going to be really confusing considering they previously did a compute version per generation. What is the output from
If it says
Do you think its possible that the driver is outputing the wrong info? @asmorkalov Either way I would suggest it would be better to wait until more info is available before merging this PR. |
johnnynunez commentedJan 22, 2025
It's okay, flash attention v4 is coming for blackwell also. They have 100 and 120. |
johnnynunez commentedJan 22, 2025
I share you in the following hours because I'm not at home. But we have press release driver 571.86 whl |
asmorkalov commentedJan 22, 2025
@cudawarped Thanks a lot for the analysis. I'll review hardware specs and return back soon. |
johnnynunez commentedJan 22, 2025
|
johnnynunez commentedJan 23, 2025
new drivers are showing cuda 12.8 and same 12.0 codename |
johnnynunez commentedJan 23, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
more references: 10.0 b100 b200 |
cudawarped commentedJan 23, 2025
@johnnynunez I still suggest we wait I can't see any downside with OpenCV because the compute capability can be manually selected. I still wouldn't be suprised if the 12 is the version of the CUDA toolkit due to the returned compute capability not being valid because the CUDA toolkit used by pytorch and used to compile nvidia-smi pre-date compute capabilities >= 9.0. |
johnnynunez commentedJan 23, 2025
yeah! Totally agree |
johnnynunez commentedJan 23, 2025
cudawarped commentedJan 24, 2025
@johnnynunez Compute capability 12 looks to be official. Consumer cards look to have less resident threads per SM. |
cudawarped commentedJan 24, 2025
@asmorkalov Builds on both Windows 11 and Ubuntu 22.04 (WSL) with CUDA Toolkit 12.8 using default architectures selection ( and |
johnnynunez commentedJan 24, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Yeah I compiled it pytorch, xformers, etc and opencv with my rtx5090 and it works. I just couldn't comment on anything because of NDA. But it was lifted yesterday |
johnnynunez commentedJan 24, 2025
@asmorkalov feel free to merge! thanks |
4b2a33a intoopencv:4.xUh oh!
There was an error while loading.Please reload this page.
Initial support Blackwell GPU archopencv#26820 10.0 blackwell b100/b20012.0 blackwell rtx50### Pull Request Readiness ChecklistSee details athttps://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request- [x] I agree to contribute to the project under Apache 2 License.- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV- [x] The PR is proposed to the proper branch- [ ] There is a reference to the original bug report and related work- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name.- [ ] The feature is well documented and sample code can be built with the project CMake




Uh oh!
There was an error while loading.Please reload this page.
10.0 blackwell b100/b200
12.0 blackwell rtx50
Pull Request Readiness Checklist
See details athttps://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.