I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

initial support blackwell

cd83daa

10.0 blackwell b100/b20012.0 blackwell rtx50

johnnynunez changed the title~~initial support blackwell~~initial support blackwell codegen

Jan 22, 2025

asmorkalov added category: build/install category: gpu/cuda (contrib)OpenCV 4.0+: moved to opencv_contrib labels

Jan 22, 2025

asmorkalov added this to the4.12.0 milestone

Jan 22, 2025

Copy link

Contributor

asmorkalov commentedJan 22, 2025

cc@cudawarped

Copy link

Contributor

cudawarped commentedJan 22, 2025

@johnnynunez Which version of the CUDA toolkit are you using to compile for Blackwell. The latest version I have access to (12.6 Update 3) does not support compute capability 10. I assume Nvidia will release a version 13.0 to coincide with the release of the first Blackwell cards.

Isn't Blackwell compute capability 10.0, where does 12.0 come from?

Copy link

ContributorAuthor

johnnynunez commentedJan 22, 2025

@johnnynunez Which version of the CUDA toolkit are you using to compile for Blackwell. The latest version I have access to (12.6 Update 3) does not support compute capability 10. I assume Nvidia will release a version 13.0 to coincide with the release of the first Blackwell cards.
Isn't Blackwell compute capability 10.0, where does 12.0 come from?

Hello,
B100 is coming support with 12.7 cuda
Rtx50 is coming with 12.8

Copy link

ContributorAuthor

johnnynunez commentedJan 22, 2025•
edited
Loading

@johnnynunez Which version of the CUDA toolkit are you using to compile for Blackwell. The latest version I have access to (12.6 Update 3) does not support compute capability 10. I assume Nvidia will release a version 13.0 to coincide with the release of the first Blackwell cards.
Isn't Blackwell compute capability 10.0, where does 12.0 come from?

Hello,
B100 is coming support with 12.7 cuda
Rtx50 is coming with 12.8

also thor is 10.1 capability.
I have rtx 5090 card.

Copy link

Contributor

cudawarped commentedJan 22, 2025

@johnnynunez So you haven't tested that this works on Blackwell 🤯? I would wait until a version of the CUDA toolkit is release which supports compute capability 10.0 to be 100% sure your change don't break anything. I would also remove compute capability 12.0.

Note: If you add 10.1 for Thor you also want to update this filter

opencv/cmake/OpenCVDetectCUDAUtils.cmake

Line 274 inea023b7

ocv_filter_available_architecture(${nvcc_executable} __cuda_arch_bin

Copy link

ContributorAuthor

johnnynunez commentedJan 22, 2025

@johnnynunez So you haven't tested that this works on Blackwell 🤯? I would wait until a version of the CUDA toolkit is release which supports compute capability 10.0 to be 100% sure your change don't break anything. I would also remove compute capability 12.0.
Note: If you add 10.1 for Thor you also want to update this filter
opencv/cmake/OpenCVDetectCUDAUtils.cmake
Line 274 inea023b7
ocv_filter_available_architecture(${nvcc_executable} __cuda_arch_bin

But why you remove 12.0?
My 5090 output is 12.0. I think that digits will be 11.0

Copy link

ContributorAuthor

johnnynunez commentedJan 22, 2025

@johnnynunez So you haven't tested that this works on Blackwell 🤯? I would wait until a version of the CUDA toolkit is release which supports compute capability 10.0 to be 100% sure your change don't break anything. I would also remove compute capability 12.0.
Note: If you add 10.1 for Thor you also want to update this filter
opencv/cmake/OpenCVDetectCUDAUtils.cmake
Line 274 inea023b7
ocv_filter_available_architecture(${nvcc_executable} __cuda_arch_bin
But why you remove 12.0? My 5090 output is 12.0. I think that digits will be 11.0

Well, today NDA is removed

Copy link

ContributorAuthor

johnnynunez commentedJan 22, 2025

@johnnynunez So you haven't tested that this works on Blackwell 🤯? I would wait until a version of the CUDA toolkit is release which supports compute capability 10.0 to be 100% sure your change don't break anything. I would also remove compute capability 12.0.
Note: If you add 10.1 for Thor you also want to update this filter
opencv/cmake/OpenCVDetectCUDAUtils.cmake
Line 274 inea023b7
ocv_filter_available_architecture(${nvcc_executable} __cuda_arch_bin
But why you remove 12.0? My 5090 output is 12.0. I think that digits will be 11.0

I added support on pytorch, xla etc

Copy link

Contributor

cudawarped commentedJan 22, 2025•
edited
Loading

But why you remove 12.0?
My 5090 output is 12.0. I think that digits will be 11.0

Can you provide me with a source to indicate that your 5090 will be compute capability 12.0?

I added support on pytorch, xla etc

Pytorch merged compute capability 10.0 and 12.0 without any build testing (I guess its a python first library)? Not sure I can see a reason for not waiting especially in OpenCV when you can manually select the compute cabability using combinations ofCUDA_ARCH_BIN andCUDA_ARCH_PTX.

Copy link

ContributorAuthor

johnnynunez commentedJan 22, 2025•
edited
Loading

But why you remove 12.0?
My 5090 output is 12.0. I think that digits will be 11.0
Can you provide me with a source to indicate that your 5090 will be compute capability 12.0?
I added support on pytorch, xla etc
Pytorch merged compute capability 10.0 and 12.0 without any build testing (I guess its a python first library)? Not sure I can see a reason for not waiting especially in OpenCV when you can manually select the compute cabability using combinations ofCUDA_ARCH_BIN andCUDA_ARCH_PTX.

more references:
pytorch/pytorch#145270 I added to pytorch
Dao-AILab/flash-attention#1436

Adding Thor support

221d1fe

Copy link

Contributor

cudawarped commentedJan 22, 2025

@johnnynunez 🤯 Nvidia is going up 3 compute capabilities with a single generation, that's going to be really confusing considering they previously did a compute version per generation.

What is the output from

nvidia-smi --query-gpu=compute_cap --format=csv

If it says

compute_cap
12.0

Do you think its possible that the driver is outputing the wrong info?

@asmorkalov Either way I would suggest it would be better to wait until more info is available before merging this PR.

Copy link

ContributorAuthor

johnnynunez commentedJan 22, 2025

nvidia-smi --query-gpu=compute_cap --format=csv

It's okay, flash attention v4 is coming for blackwell also. They have 100 and 120.
I totally agree haha but well, nvidia now have a lot of products...
ARM upcoming laptops, gpus desktops, digits, jetson, and data centers

Copy link

ContributorAuthor

johnnynunez commentedJan 22, 2025

@johnnynunez 🤯 Nvidia is going up 3 compute capabilities with a single generation, that's going to be really confusing considering they previously did a compute version per generation.
What is the output from
nvidia-smi --query-gpu=compute_cap --format=csv
If it says
compute_cap
12.0
Do you think its possible that the driver is outputing the wrong info?
@asmorkalov Either way I would suggest it would be better to wait until more info is available before merging this PR.

I share you in the following hours because I'm not at home. But we have press release driver 571.86 whl

Copy link

Contributor

asmorkalov commentedJan 22, 2025

@cudawarped Thanks a lot for the analysis. I'll review hardware specs and return back soon.

asmorkalov self-requested a review

January 22, 2025 09:56

Copy link

ContributorAuthor

johnnynunez commentedJan 22, 2025

But why you remove 12.0?
My 5090 output is 12.0. I think that digits will be 11.0
Can you provide me with a source to indicate that your 5090 will be compute capability 12.0?
I added support on pytorch, xla etc
Pytorch merged compute capability 10.0 and 12.0 without any build testing (I guess its a python first library)? Not sure I can see a reason for not waiting especially in OpenCV when you can manually select the compute cabability using combinations ofCUDA_ARCH_BIN andCUDA_ARCH_PTX.

Copy link

ContributorAuthor

johnnynunez commentedJan 23, 2025

But why you remove 12.0?
My 5090 output is 12.0. I think that digits will be 11.0
Can you provide me with a source to indicate that your 5090 will be compute capability 12.0?
I added support on pytorch, xla etc
Pytorch merged compute capability 10.0 and 12.0 without any build testing (I guess its a python first library)? Not sure I can see a reason for not waiting especially in OpenCV when you can manually select the compute cabability using combinations ofCUDA_ARCH_BIN andCUDA_ARCH_PTX.

new drivers are showing cuda 12.8 and same 12.0 codename

Copy link

ContributorAuthor

johnnynunez commentedJan 23, 2025•
edited
Loading

more references:
NVIDIA/cccl#3493

10.0 b100 b200
10.0a arm laptops, digits?
10.1 thor
10.1a arm laptops, digits?
12.0 rtx50

Copy link

Contributor

cudawarped commentedJan 23, 2025

@johnnynunez I still suggest we wait I can't see any downside with OpenCV because the compute capability can be manually selected.

I still wouldn't be suprised if the 12 is the version of the CUDA toolkit due to the returned compute capability not being valid because the CUDA toolkit used by pytorch and used to compile nvidia-smi pre-date compute capabilities >= 9.0.

Copy link

ContributorAuthor

johnnynunez commentedJan 23, 2025

@johnnynunez I still suggest we wait I can't see any downside with OpenCV because the compute capability can be manually selected.
I still wouldn't be suprised if the 12 is the version of the CUDA toolkit due to the returned compute capability not being valid because the CUDA toolkit used by pytorch and used to compile nvidia-smi pre-date compute capabilities >= 9.0.

yeah! Totally agree

Copy link

ContributorAuthor

johnnynunez commentedJan 23, 2025

cuda 12.8 is out

opencv-alalek added the feature label

Jan 23, 2025

Copy link

Contributor

cudawarped commentedJan 24, 2025

@johnnynunez Compute capability 12 looks to be official. Consumer cards look to have less resident threads per SM.

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html?highlight=compute%2520capability#features-and-technical-specifications

Copy link

Contributor

cudawarped commentedJan 24, 2025

@asmorkalov Builds on both Windows 11 and Ubuntu 22.04 (WSL) with CUDA Toolkit 12.8 using default architectures selection (CUDA_ARCH_BIN andCUDA_ARCH_PTX not specified)

--   NVIDIA CUDA:                   YES (ver 12.8, CUFFT CUBLAS NVCUVID NVCUVENC)--     NVIDIA GPU arch:             50 52 60 61 70 75 80 86 89 90 100 120--     NVIDIA PTX archs:            120----   cuDNN:                         YES (ver 9.7.0)

and-DCUDA_GENERATION=Blackwell

--   NVIDIA CUDA:                   YES (ver 12.8, CUFFT CUBLAS NVCUVID NVCUVENC)--     NVIDIA GPU arch:             100 120--     NVIDIA PTX archs:----   cuDNN:                         YES (ver 9.7.0)

Copy link

ContributorAuthor

johnnynunez commentedJan 24, 2025•
edited
Loading

@asmorkalov Builds on both Windows 11 and Ubuntu 22.04 (WSL) with CUDA Toolkit 12.8 using default architectures selection (CUDA_ARCH_BIN andCUDA_ARCH_PTX not specified)
--   NVIDIA CUDA:                   YES (ver 12.8, CUFFT CUBLAS NVCUVID NVCUVENC)--     NVIDIA GPU arch:             50 52 60 61 70 75 80 86 89 90 100 120--     NVIDIA PTX archs:            120----   cuDNN:                         YES (ver 9.7.0)
and-DCUDA_GENERATION=Blackwell
--   NVIDIA CUDA:                   YES (ver 12.8, CUFFT CUBLAS NVCUVID NVCUVENC)--     NVIDIA GPU arch:             100 120--     NVIDIA PTX archs:----   cuDNN:                         YES (ver 9.7.0)

Yeah I compiled it pytorch, xformers, etc and opencv with my rtx5090 and it works. I just couldn't comment on anything because of NDA. But it was lifted yesterday

Copy link

ContributorAuthor

johnnynunez commentedJan 24, 2025

@asmorkalov feel free to merge! thanks

asmorkalov self-assigned this

Jan 25, 2025

asmorkalov changed the title~~initial support blackwell codegen~~Initial support Blackwell GPU arch

Jan 25, 2025

asmorkalov merged commit4b2a33a intoopencv:4.x

Jan 25, 2025

29 of 31 checks passed

ConnorBaker mentioned this pull request

Feb 13, 2025

cmake/OpenCVDetectCUDAUtils.cmake: use IN_LIST to avoid regex matching valid capabilities#26920

Merged

6 tasks

asmorkalov mentioned this pull request

Feb 19, 2025

5.x merge 4.x#26939

Merged

NanQin555 pushed a commit to NanQin555/opencv that referenced this pull request

Feb 24, 2025

Merge pull requestopencv#26820from johnnynunez:patch-1

e230dbb

Initial support Blackwell GPU archopencv#26820  10.0 blackwell b100/b20012.0 blackwell rtx50### Pull Request Readiness ChecklistSee details athttps://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request- [x] I agree to contribute to the project under Apache 2 License.- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV- [x] The PR is proposed to the proper branch- [ ] There is a reference to the original bug report and related work- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable      Patch to opencv_extra has the same branch name.- [ ] The feature is well documented and sample code can be built with the project CMake

cudawarped mentioned this pull request

Mar 16, 2025

CUDA_ARCH_BIN and CUDA_ARCH_PTX contain "12.0" which doesn't exist yet#27077

Closed

4 tasks

Labels

category: build/install category: gpu/cuda (contrib)

OpenCV 4.0+: moved to opencv_contrib

feature

Movatterモバイル変換

Uh oh!

Initial support Blackwell GPU arch#26820

Initial support Blackwell GPU arch#26820

Conversation

johnnynunez commentedJan 22, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Pull Request Readiness Checklist

Uh oh!

asmorkalov commentedJan 22, 2025

Uh oh!

cudawarped commentedJan 22, 2025

Uh oh!

johnnynunez commentedJan 22, 2025

Uh oh!

johnnynunez commentedJan 22, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

cudawarped commentedJan 22, 2025

Uh oh!

johnnynunez commentedJan 22, 2025

Uh oh!

johnnynunez commentedJan 22, 2025

Uh oh!

johnnynunez commentedJan 22, 2025

Uh oh!

cudawarped commentedJan 22, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

johnnynunez commentedJan 22, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

cudawarped commentedJan 22, 2025

Uh oh!

johnnynunez commentedJan 22, 2025

Uh oh!

johnnynunez commentedJan 22, 2025

Uh oh!

asmorkalov commentedJan 22, 2025

Uh oh!

johnnynunez commentedJan 22, 2025

Uh oh!

johnnynunez commentedJan 23, 2025

Uh oh!

johnnynunez commentedJan 23, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

cudawarped commentedJan 23, 2025

Uh oh!

johnnynunez commentedJan 23, 2025

Uh oh!

johnnynunez commentedJan 23, 2025

Uh oh!

cudawarped commentedJan 24, 2025

Uh oh!

cudawarped commentedJan 24, 2025

Uh oh!

johnnynunez commentedJan 24, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

johnnynunez commentedJan 24, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

johnnynunez commentedJan 22, 2025•
edited
Loading

johnnynunez commentedJan 22, 2025•
edited
Loading

cudawarped commentedJan 22, 2025•
edited
Loading

johnnynunez commentedJan 22, 2025•
edited
Loading

johnnynunez commentedJan 23, 2025•
edited
Loading

johnnynunez commentedJan 24, 2025•
edited
Loading