Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Initial support Blackwell GPU arch#26820

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
asmorkalov merged 2 commits intoopencv:4.xfromjohnnynunez:patch-1
Jan 25, 2025
Merged

Conversation

@johnnynunez
Copy link
Contributor

@johnnynunezjohnnynunez commentedJan 22, 2025
edited
Loading

10.0 blackwell b100/b200
12.0 blackwell rtx50

Pull Request Readiness Checklist

See details athttps://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

opencv-alalek reacted with thumbs up emoji
10.0 blackwell b100/b20012.0 blackwell rtx50
@johnnynunezjohnnynunez changed the titleinitial support blackwellinitial support blackwell codegenJan 22, 2025
@asmorkalovasmorkalov added this to the4.12.0 milestoneJan 22, 2025
@asmorkalov
Copy link
Contributor

cc@cudawarped

@cudawarped
Copy link
Contributor

@johnnynunez Which version of the CUDA toolkit are you using to compile for Blackwell. The latest version I have access to (12.6 Update 3) does not support compute capability 10. I assume Nvidia will release a version 13.0 to coincide with the release of the first Blackwell cards.

Isn't Blackwell compute capability 10.0, where does 12.0 come from?

@johnnynunez
Copy link
ContributorAuthor

@johnnynunez Which version of the CUDA toolkit are you using to compile for Blackwell. The latest version I have access to (12.6 Update 3) does not support compute capability 10. I assume Nvidia will release a version 13.0 to coincide with the release of the first Blackwell cards.

Isn't Blackwell compute capability 10.0, where does 12.0 come from?

Hello,
B100 is coming support with 12.7 cuda
Rtx50 is coming with 12.8

@johnnynunez
Copy link
ContributorAuthor

johnnynunez commentedJan 22, 2025
edited
Loading

@johnnynunez Which version of the CUDA toolkit are you using to compile for Blackwell. The latest version I have access to (12.6 Update 3) does not support compute capability 10. I assume Nvidia will release a version 13.0 to coincide with the release of the first Blackwell cards.

Isn't Blackwell compute capability 10.0, where does 12.0 come from?

Hello,
B100 is coming support with 12.7 cuda
Rtx50 is coming with 12.8

also thor is 10.1 capability.
I have rtx 5090 card.

@cudawarped
Copy link
Contributor

@johnnynunez So you haven't tested that this works on Blackwell 🤯? I would wait until a version of the CUDA toolkit is release which supports compute capability 10.0 to be 100% sure your change don't break anything. I would also remove compute capability 12.0.

Note: If you add 10.1 for Thor you also want to update this filter

ocv_filter_available_architecture(${nvcc_executable} __cuda_arch_bin

@johnnynunez
Copy link
ContributorAuthor

@johnnynunez So you haven't tested that this works on Blackwell 🤯? I would wait until a version of the CUDA toolkit is release which supports compute capability 10.0 to be 100% sure your change don't break anything. I would also remove compute capability 12.0.

Note: If you add 10.1 for Thor you also want to update this filter

ocv_filter_available_architecture(${nvcc_executable} __cuda_arch_bin

But why you remove 12.0?
My 5090 output is 12.0. I think that digits will be 11.0

@johnnynunez
Copy link
ContributorAuthor

@johnnynunez So you haven't tested that this works on Blackwell 🤯? I would wait until a version of the CUDA toolkit is release which supports compute capability 10.0 to be 100% sure your change don't break anything. I would also remove compute capability 12.0.
Note: If you add 10.1 for Thor you also want to update this filter

ocv_filter_available_architecture(${nvcc_executable} __cuda_arch_bin

But why you remove 12.0? My 5090 output is 12.0. I think that digits will be 11.0

Well, today NDA is removed

@johnnynunez
Copy link
ContributorAuthor

@johnnynunez So you haven't tested that this works on Blackwell 🤯? I would wait until a version of the CUDA toolkit is release which supports compute capability 10.0 to be 100% sure your change don't break anything. I would also remove compute capability 12.0.
Note: If you add 10.1 for Thor you also want to update this filter

ocv_filter_available_architecture(${nvcc_executable} __cuda_arch_bin

But why you remove 12.0? My 5090 output is 12.0. I think that digits will be 11.0

I added support on pytorch, xla etc

@cudawarped
Copy link
Contributor

cudawarped commentedJan 22, 2025
edited
Loading

But why you remove 12.0?
My 5090 output is 12.0. I think that digits will be 11.0

Can you provide me with a source to indicate that your 5090 will be compute capability 12.0?

I added support on pytorch, xla etc

Pytorch merged compute capability 10.0 and 12.0 without any build testing (I guess its a python first library)? Not sure I can see a reason for not waiting especially in OpenCV when you can manually select the compute cabability using combinations ofCUDA_ARCH_BIN andCUDA_ARCH_PTX.

@johnnynunez
Copy link
ContributorAuthor

johnnynunez commentedJan 22, 2025
edited
Loading

But why you remove 12.0?
My 5090 output is 12.0. I think that digits will be 11.0

Can you provide me with a source to indicate that your 5090 will be compute capability 12.0?

I added support on pytorch, xla etc

Pytorch merged compute capability 10.0 and 12.0 without any build testing (I guess its a python first library)? Not sure I can see a reason for not waiting especially in OpenCV when you can manually select the compute cabability using combinations ofCUDA_ARCH_BIN andCUDA_ARCH_PTX.

image

more references:
pytorch/pytorch#145270 I added to pytorch
Dao-AILab/flash-attention#1436

@cudawarped
Copy link
Contributor

@johnnynunez 🤯 Nvidia is going up 3 compute capabilities with a single generation, that's going to be really confusing considering they previously did a compute version per generation.

What is the output from

nvidia-smi --query-gpu=compute_cap --format=csv

If it says

compute_cap
12.0

Do you think its possible that the driver is outputing the wrong info?

@asmorkalov Either way I would suggest it would be better to wait until more info is available before merging this PR.

@johnnynunez
Copy link
ContributorAuthor

nvidia-smi --query-gpu=compute_cap --format=csv

It's okay, flash attention v4 is coming for blackwell also. They have 100 and 120.
I totally agree haha but well, nvidia now have a lot of products...
ARM upcoming laptops, gpus desktops, digits, jetson, and data centers

@johnnynunez
Copy link
ContributorAuthor

@johnnynunez 🤯 Nvidia is going up 3 compute capabilities with a single generation, that's going to be really confusing considering they previously did a compute version per generation.

What is the output from

nvidia-smi --query-gpu=compute_cap --format=csv

If it says

compute_cap
12.0

Do you think its possible that the driver is outputing the wrong info?

@asmorkalov Either way I would suggest it would be better to wait until more info is available before merging this PR.

I share you in the following hours because I'm not at home. But we have press release driver 571.86 whl

@asmorkalov
Copy link
Contributor

@cudawarped Thanks a lot for the analysis. I'll review hardware specs and return back soon.

cudawarped reacted with thumbs up emojijohnnynunez reacted with heart emoji

@asmorkalovasmorkalov self-requested a reviewJanuary 22, 2025 09:56
@johnnynunez
Copy link
ContributorAuthor

But why you remove 12.0?
My 5090 output is 12.0. I think that digits will be 11.0

Can you provide me with a source to indicate that your 5090 will be compute capability 12.0?

I added support on pytorch, xla etc

Pytorch merged compute capability 10.0 and 12.0 without any build testing (I guess its a python first library)? Not sure I can see a reason for not waiting especially in OpenCV when you can manually select the compute cabability using combinations ofCUDA_ARCH_BIN andCUDA_ARCH_PTX.

image

@johnnynunez
Copy link
ContributorAuthor

But why you remove 12.0?
My 5090 output is 12.0. I think that digits will be 11.0

Can you provide me with a source to indicate that your 5090 will be compute capability 12.0?

I added support on pytorch, xla etc

Pytorch merged compute capability 10.0 and 12.0 without any build testing (I guess its a python first library)? Not sure I can see a reason for not waiting especially in OpenCV when you can manually select the compute cabability using combinations ofCUDA_ARCH_BIN andCUDA_ARCH_PTX.

image

new drivers are showing cuda 12.8 and same 12.0 codename

@johnnynunez
Copy link
ContributorAuthor

johnnynunez commentedJan 23, 2025
edited
Loading

more references:
NVIDIA/cccl#3493

10.0 b100 b200
10.0a arm laptops, digits?
10.1 thor
10.1a arm laptops, digits?
12.0 rtx50

@cudawarped
Copy link
Contributor

@johnnynunez I still suggest we wait I can't see any downside with OpenCV because the compute capability can be manually selected.

I still wouldn't be suprised if the 12 is the version of the CUDA toolkit due to the returned compute capability not being valid because the CUDA toolkit used by pytorch and used to compile nvidia-smi pre-date compute capabilities >= 9.0.

@johnnynunez
Copy link
ContributorAuthor

@johnnynunez I still suggest we wait I can't see any downside with OpenCV because the compute capability can be manually selected.

I still wouldn't be suprised if the 12 is the version of the CUDA toolkit due to the returned compute capability not being valid because the CUDA toolkit used by pytorch and used to compile nvidia-smi pre-date compute capabilities >= 9.0.

yeah! Totally agree

@johnnynunez
Copy link
ContributorAuthor

cuda 12.8 is out
image

asmorkalov reacted with thumbs up emoji

@cudawarped
Copy link
Contributor

@johnnynunez Compute capability 12 looks to be official. Consumer cards look to have less resident threads per SM.

https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html?highlight=compute%2520capability#features-and-technical-specifications

johnnynunez reacted with thumbs up emoji

@cudawarped
Copy link
Contributor

@asmorkalov Builds on both Windows 11 and Ubuntu 22.04 (WSL) with CUDA Toolkit 12.8 using default architectures selection (CUDA_ARCH_BIN andCUDA_ARCH_PTX not specified)

--   NVIDIA CUDA:                   YES (ver 12.8, CUFFT CUBLAS NVCUVID NVCUVENC)--     NVIDIA GPU arch:             50 52 60 61 70 75 80 86 89 90 100 120--     NVIDIA PTX archs:            120----   cuDNN:                         YES (ver 9.7.0)

and-DCUDA_GENERATION=Blackwell

--   NVIDIA CUDA:                   YES (ver 12.8, CUFFT CUBLAS NVCUVID NVCUVENC)--     NVIDIA GPU arch:             100 120--     NVIDIA PTX archs:----   cuDNN:                         YES (ver 9.7.0)
asmorkalov, johnnynunez, and syjeon-markany reacted with thumbs up emojiasmorkalov and johnnynunez reacted with heart emoji

@johnnynunez
Copy link
ContributorAuthor

johnnynunez commentedJan 24, 2025
edited
Loading

@asmorkalov Builds on both Windows 11 and Ubuntu 22.04 (WSL) with CUDA Toolkit 12.8 using default architectures selection (CUDA_ARCH_BIN andCUDA_ARCH_PTX not specified)

--   NVIDIA CUDA:                   YES (ver 12.8, CUFFT CUBLAS NVCUVID NVCUVENC)--     NVIDIA GPU arch:             50 52 60 61 70 75 80 86 89 90 100 120--     NVIDIA PTX archs:            120----   cuDNN:                         YES (ver 9.7.0)

and-DCUDA_GENERATION=Blackwell

--   NVIDIA CUDA:                   YES (ver 12.8, CUFFT CUBLAS NVCUVID NVCUVENC)--     NVIDIA GPU arch:             100 120--     NVIDIA PTX archs:----   cuDNN:                         YES (ver 9.7.0)

Yeah I compiled it pytorch, xformers, etc and opencv with my rtx5090 and it works. I just couldn't comment on anything because of NDA. But it was lifted yesterday

johnnynunez reacted with heart emoji

@johnnynunez
Copy link
ContributorAuthor

@asmorkalov feel free to merge! thanks

@asmorkalovasmorkalov self-assigned thisJan 25, 2025
@asmorkalovasmorkalov changed the titleinitial support blackwell codegenInitial support Blackwell GPU archJan 25, 2025
@asmorkalovasmorkalov merged commit4b2a33a intoopencv:4.xJan 25, 2025
29 of 31 checks passed
@asmorkalovasmorkalov mentioned this pull requestFeb 19, 2025
NanQin555 pushed a commit to NanQin555/opencv that referenced this pull requestFeb 24, 2025
Initial support Blackwell GPU archopencv#26820  10.0 blackwell b100/b20012.0 blackwell rtx50### Pull Request Readiness ChecklistSee details athttps://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request- [x] I agree to contribute to the project under Apache 2 License.- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV- [x] The PR is proposed to the proper branch- [ ] There is a reference to the original bug report and related work- [ ] There is accuracy test, performance test and test data in opencv_extra repository, if applicable      Patch to opencv_extra has the same branch name.- [ ] The feature is well documented and sample code can be built with the project CMake
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@asmorkalovasmorkalovAwaiting requested review from asmorkalov

Assignees

@asmorkalovasmorkalov

Labels

Projects

None yet

Milestone

4.12.0

Development

Successfully merging this pull request may close these issues.

4 participants

@johnnynunez@asmorkalov@cudawarped@opencv-alalek

[8]ページ先頭

©2009-2025 Movatter.jp