huggingface/text-embeddings-inferencePublic

NotificationsYou must be signed in to change notification settings
Fork288
Star3.8k

chore:`Dockerfile-cuda` - Retain major CC when pruning static cuBLAS lib#635

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Open

polarathene wants to merge1 commit intohuggingface:main

base:main

Choose a base branch

frompolarathene:patch-1

Open

chore:`Dockerfile-cuda` - Retain major CC when pruning static cuBLAS lib#635

polarathene wants to merge1 commit intohuggingface:mainfrompolarathene:patch-1

Conversation

Copy link

polarathene commentedJun 13, 2025

What does this PR do?

Pruning cuBLAS for CC 7.5 now also retainssm_70 in addition to thesm_75 target. See#610 (comment) for more information.

chore:Dockerfile-cuda - Pruning cuBLAS should retain major CC

1681b19

Copy link

Author

polarathene commentedJun 13, 2025•
edited
Loading

NOTE: There is no known need to do this for TEI, howeverNvidia encourages retaining the major CC and any minors in-between when usingnvprune on cuBLAS.

Feel free to close the PR if you prefer to avoid until there's a relevant bug report. My understanding is it should only be an issue when using a kernel from cuBLAS that would defer tosm_70 when it'd have been equivalent forsm_75.

For example in the current base image used to build,sm_70 has 184 cubins vssm_75 containing only 8:

$cuobjdump --list-elf /usr/local/cuda/lib64/libcublas_static.a| grep -oE'\.sm_70.*\.'| wc -l184$cuobjdump --list-elf /usr/local/cuda/lib64/libcublas_static.a| grep -oE'\.sm_75.*\.'| wc -l8#Individual cubins:$cuobjdump --list-elf /usr/local/cuda/lib64/libcublas_static.a| grep -E'\.sm_75.*\.'ELF file    5: libcublas_static.5.sm_75.cubinELF file   13: libcublas_static.13.sm_75.cubinELF file   21: libcublas_static.21.sm_75.cubinELF file   29: libcublas_static.29.sm_75.cubinELF file   37: libcublas_static.37.sm_75.cubinELF file   45: libcublas_static.45.sm_75.cubinELF file   53: libcublas_static.53.sm_75.cubinELF file   61: libcublas_static.61.sm_75.cubin

I'm not entirely sure why the minor CC versions in-between (when present) might matter to be retained.

The concern does not apply to the other two supported real archs handled vianvprune assm_80 is already provided, whilesm_90 does not target anything newer (since it's theonly arch for that CC major):

text-embeddings-inference/Dockerfile-cuda

Lines 57 to 60 in53eae1b

	nvprune --generate-code code=sm_80 --generate-code code=sm_${CUDA_COMPUTE_CAP} /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a; \
	elif [ ${CUDA_COMPUTE_CAP} -eq 90 ]; \
	then \
	nvprune --generate-code code=sm_90 /usr/local/cuda/lib64/libcublas_static.a -o /usr/local/cuda/lib64/libcublas_static.a; \

polarathene mentioned this pull request

Jun 13, 2025

Update Docker images to latest Ubuntu version#610

Open

5 tasks

Labels

None yet

1 participant

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore:`Dockerfile-cuda` - Retain major CC when pruning static cuBLAS lib#635

Are you sure you want to change the base?

chore:`Dockerfile-cuda` - Retain major CC when pruning static cuBLAS lib#635

Uh oh!

Conversation

polarathene commentedJun 13, 2025

What does this PR do?

Uh oh!

polarathene commentedJun 13, 2025•
edited
Loading

Uh oh!

Uh oh!

Uh oh!

Movatterモバイル変換

chore:Dockerfile-cuda - Retain major CC when pruning static cuBLAS lib#635

Are you sure you want to change the base?

chore:Dockerfile-cuda - Retain major CC when pruning static cuBLAS lib#635

Uh oh!

Conversation

polarathene commentedJun 13, 2025

What does this PR do?

Uh oh!

polarathene commentedJun 13, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

Uh oh!

chore:`Dockerfile-cuda` - Retain major CC when pruning static cuBLAS lib#635

chore:`Dockerfile-cuda` - Retain major CC when pruning static cuBLAS lib#635

polarathene commentedJun 13, 2025•
edited
Loading