Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

GH-131521: fix clangcl build on Windows for zlib-ng#131526

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
zooba merged 7 commits intopython:mainfromchris-eibl:fix_clangcl_zlibng
Mar 24, 2025

Conversation

chris-eibl
Copy link
Member

@chris-eiblchris-eibl commentedMar 20, 2025
edited
Loading

I gave it a try. For me it compiles andtest_zlib is green.

I've only thrown in as many-m as needed.

But only for those files that really need them!

I started to set that for all files (the diff would have been much smaller), but

  • this does not seem right to me: the compiler would be eligible to generate such code for all*.c files given, and the resulting binary would then not run on all hosts
  • test_zlib failed withinvalid instruction if I compiled all*.c files with the "all the-m. This convinced me even more to be "selective" here: my ancient Haswell i5 4570 only supports up to AVX2 and it makes sense, that it would not happily accept all that instructions (up to AVX512).

<ClCompile Include="$(zlibNgDir)\arch\x86\chunkset_ssse3.c">
<AdditionalOptions Condition="$(PlatformToolset) == 'ClangCL'">%(AdditionalOptions) -mssse3</AdditionalOptions>
</ClCompile>
<ClCompile Include="$(zlibNgDir)\arch\x86\adler32_sse42.c">
Copy link
MemberAuthor

@chris-eiblchris-eiblMar 20, 2025
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This is interesting: the filename suggestssse43 sse42, but-mssse3 deemed sufficient.
Ups: edited typo.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Probably worth matching the filename, so that if an updated zlib-ng uses SSE4.3 instructions we don't need to modify anything here.

The file relies on intrinsics, so it's not going to automatically generate newer instructions.

@@ -97,6 +97,7 @@
<PreprocessorDefinitions>%(PreprocessorDefinitions);ZLIB_COMPAT;WITH_GZFILEOP;NO_FSEEKO;HAVE_BUILTIN_ASSUME_ALIGNED;_CRT_SECURE_NO_DEPRECATE;_CRT_NONSTDC_NO_DEPRECATE;</PreprocessorDefinitions>
<PreprocessorDefinitions Condition="$(Platform) == 'Win32' or $(Platform) == 'x64'">%(PreprocessorDefinitions);X86_FEATURES;X86_HAVE_XSAVE_INTRIN;X86_SSE2;X86_SSSE3;X86_SSE42;X86_PCLMULQDQ_CRC;X86_AVX2;X86_AVX512;X86_AVX512VNNI;X86_VPCLMULQDQ_CRC</PreprocessorDefinitions>
<PreprocessorDefinitions Condition="$(Configuration) == 'Debug'">%(PreprocessorDefinitions);ZLIB_DEBUG</PreprocessorDefinitions>
<PreprocessorDefinitions Condition="$(PlatformToolset) == 'ClangCL'">%(PreprocessorDefinitions);HAVE_BUILTIN_CTZ</PreprocessorDefinitions>
<EnableEnhancedInstructionSet Condition="$(Platform) == 'Win32' or $(Platform) == 'x64'">AdvancedVectorExtensions2</EnableEnhancedInstructionSet>
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'd feel better we'd not enable that for all*.c files (at lest not onWin32). MSVC would be totally happy without it (and AFAIR defaults to SSE2 if not set). Maybe I'd then have to throw in more-m for clang-cl, but IMHO we're safer without that even in the MSVC case, so that the resulting binary can run on ancient CPUs, too.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yeah, I suspect you're right on this. I'm not sure whether the intrinsics code is skipped or still generates fine without this, but it wouldn't surprise me if it's being ignored (there are__AVX*__ preprocessor checks that will behave differently without this setting).

Putting it just on the .c files fromarch\x86 should be enough. Those are already in a conditional ItemGroup, so the condition isn't needed each time (the ClangCL condition is, though).

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yeah, I suspect you're right on this. I'm not sure whether the intrinsics code is skipped or still generates fine without this, but it wouldn't surprise me if it's being ignored (there are__AVX*__ preprocessor checks that will behave differently without this setting).

Well spotted! Although MSVC allows you to use intrinsics even if the respective/arch part isn't set, it won't set the macros__AVX__, etc. Please note, that it never sets macros like__SSE2__:

Doing a regex__[A-Z]+[0-9]+__ only reveals

#  if (defined(X86_SSE2) && defined(__SSE2__)) || defined(__x86_64__) || defined(_M_X64) || defined(X86_NOCHECK_SSE2)E:\cpython_clang_pgo2\externals\zlib-ng-2.2.4\arch\x86\x86_functions.h7637#  if defined(X86_SSSE3) && defined(__SSSE3__)E:\cpython_clang_pgo2\externals\zlib-ng-2.2.4\arch\x86\x86_functions.h9537#if defined(X86_PCLMULQDQ_CRC) && defined(__PCLMUL__)E:\cpython_clang_pgo2\externals\zlib-ng-2.2.4\arch\x86\x86_functions.h11043#  if defined(X86_AVX2) && defined(__AVX2__)E:\cpython_clang_pgo2\externals\zlib-ng-2.2.4\arch\x86\x86_functions.h12336#  if defined(X86_AVX512) && defined(__AVX512F__) && defined(__AVX512DQ__) && defined(__AVX512BW__) && defined(__AVX512VL__)E:\cpython_clang_pgo2\externals\zlib-ng-2.2.4\arch\x86\x86_functions.h14738#    if defined(X86_AVX512VNNI) && defined(__AVX512VNNI__)E:\cpython_clang_pgo2\externals\zlib-ng-2.2.4\arch\x86\x86_functions.h15944#    if defined(__PCLMUL__) && defined(__AVX512F__) && defined(__VPCLMULQDQ__)E:\cpython_clang_pgo2\externals\zlib-ng-2.2.4\arch\x86\x86_functions.h16617

which are within an#ifdef DISABLE_RUNTIME_CPU_DETECTION, which isn't set.

The other hits are inx86_intrins.h:

#ifdef __AVX2__E:\cpython_clang_pgo2\externals\zlib-ng-2.2.4\arch\x86\x86_intrins.h78#if (!defined(__clang__) && !defined(__NVCOMPILER) && defined(__GNUC__) && __GNUC__ < 10) \E:\cpython_clang_pgo2\externals\zlib-ng-2.2.4\arch\x86\x86_intrins.h1063#ifdef __AVX512F__E:\cpython_clang_pgo2\externals\zlib-ng-2.2.4\arch\x86\x86_intrins.h188#endif // __AVX512F__E:\cpython_clang_pgo2\externals\zlib-ng-2.2.4\arch\x86\x86_intrins.h2411#endif // __AVX2__E:\cpython_clang_pgo2\externals\zlib-ng-2.2.4\arch\x86\x86_intrins.h2711#ifdef __AVX512F__E:\cpython_clang_pgo2\externals\zlib-ng-2.2.4\arch\x86\x86_intrins.h318#if (!defined(__clang__) && !defined(__NVCOMPILER) && defined(__GNUC__) && __GNUC__ < 9)E:\cpython_clang_pgo2\externals\zlib-ng-2.2.4\arch\x86\x86_intrins.h3263#endif // __AVX512F__E:\cpython_clang_pgo2\externals\zlib-ng-2.2.4\arch\x86\x86_intrins.h6711#ifdef __AVX2__E:\cpython_clang_pgo2\externals\zlib-ng-2.2.4\arch\x86\x86_intrins.h748#endif // __AVX2__E:\cpython_clang_pgo2\externals\zlib-ng-2.2.4\arch\x86\x86_intrins.h7811#ifdef __AVX512F__E:\cpython_clang_pgo2\externals\zlib-ng-2.2.4\arch\x86\x86_intrins.h808#endif // __AVX512F__E:\cpython_clang_pgo2\externals\zlib-ng-2.2.4\arch\x86\x86_intrins.h8411#if defined(_MSC_VER) && defined(__AVX512F__) && !defined(_MM_K0_REG8)E:\cpython_clang_pgo2\externals\zlib-ng-2.2.4\arch\x86\x86_intrins.h8834

where intrinsic quirks of some compiler versions are taken care of. There are some for MSVC that even include__AVX512F__, which wouldn't be covered with the currentAdvancedVectorExtensions2. My latest commits should fix this.

Putting it just on the .c files fromarch\x86 should be enough. Those are already in a conditional ItemGroup, so the condition isn't needed each time (the ClangCL condition is, though).

Yeah, I've only set it to those files that really need them, rendering some clang specific stuff obsolete :)

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

which are within an#ifdef DISABLE_RUNTIME_CPU_DETECTION, which isn't set.

This makes me quite confident, that you're spot on#131438 (comment)

I believe it's detected and called dynamically based on availability, but it's all compiled at compile time.

I.e., we are in the "dynamic dispatch" code path
https://github.com/zlib-ng/zlib-ng/blob/860e4cff7917d93f54f5d7f0bc1d0e8b1a3cb988/functable.h#L12-L26

#ifdefDISABLE_RUNTIME_CPU_DETECTION#  include"arch_functions.h"/* When compiling with native instructions it is not necessary to use functable. * Instead we use native_ macro indicating the best available variant of arch-specific * functions for the current platform. */#  defineFUNCTABLE_INIT ((void)0)#  defineFUNCTABLE_CALL(name) native_ ## name#  defineFUNCTABLE_FPTR(name) &native_ ## name#elsestructfunctable_s {

wherefunctable_s will be populated at runtime according to the capabilities of the CPU in:
https://github.com/zlib-ng/zlib-ng/blob/860e4cff7917d93f54f5d7f0bc1d0e8b1a3cb988/functable.c#L45-L49

staticvoidinit_functable(void) {structfunctable_sft;structcpu_featurescf;cpu_check_features(&cf);

Using the/arch option only for the needed files will now guarantee that the binary can run even on ancient CPUs and will use optimized code paths according to the CPU capabilites.

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Using the/arch option only for the needed files will now guarantee that the binary can run even on ancient CPUs and will use optimized code paths according to the CPU capabilites.

This is similar to#130213 /#130447. Thus, I've checked using clang-cl 18.1.8: it is picky about just "seeing" intrinsics without the appropriate-m architecture - and it happily compiled :)

@chris-eibl
Copy link
MemberAuthor

Maybe someone can trigger Windows tailcall CI - it's the only one that really is of interest here, but is only triggered for ceval and the like

paths:
-'.github/workflows/tail-call.yml'
-'Python/bytecodes.c'
-'Python/ceval.c'
-'Python/ceval_macros.h'
-'Python/generated_cases.c.h'

@zooba
Copy link
Member

Maybe someone can trigger Windows tailcall CI

Just touch one of those files - a comment will do. And undo it before it gets merged (make the comment something like// DO NOT MERGE - just triggering CI)

so that the resulting binary can be executed on older CPUs, too.Also use AdvancedVectorExtensions512 where necessary.
some of the files tail-call.yml is watching were touchedon main, and thus I think it will fire :)
@chris-eibl
Copy link
MemberAuthor

I think this is a skip news?

@chris-eibl
Copy link
MemberAuthor

Windows tail-call CI is now green:https://github.com/python/cpython/actions/runs/14006628024/job/39221297660

Having a detailed look, ISTM that the tests are not run, though. This is not related to this PR, the last green build some days ago did not run them either:https://github.com/python/cpython/actions/runs/13954576580/job/39062383429#step:4:326

After building, no more output can be seen.

This is interesting, because

-name:Native Windows (debug)
if:runner.os == 'Windows' && matrix.architecture != 'ARM64'
shell:cmd
run:|
choco install llvm --allow-downgrade --no-progress --version ${{ matrix.llvm }}.1.5
set PlatformToolset=clangcl
set LLVMToolsVersion=${{ matrix.llvm }}.1.5
set LLVMInstallDir=C:\Program Files\LLVM
./PCbuild/build.bat --tail-call-interp -d -p ${{ matrix.architecture }}
./PCbuild/rt.bat -d -p ${{ matrix.architecture }} -q --multiprocess 0 --timeout 4500 --verbose2 --verbose3

clearly./PCbuild/rt.bat is invoked. Maybe the reason is, because it is not used with backslashes like in

The question then is, why./PCbuild/build.bat is working ...

@Fidget-Spinner: any idea?

@Fidget-Spinner
Copy link
Member

CI uses cmd instead if powershell, could that make a difference?

@chris-eibl
Copy link
MemberAuthor

Maybe?
IMHO, we shouldn't do that here in this PR but create a separate issue and PR?

@@ -90,6 +90,7 @@
<ItemDefinitionGroup>
<ClCompile>
<AdditionalOptions>%(AdditionalOptions) /utf-8 /w34242</AdditionalOptions>
<AdditionalOptions Condition="$(SupportPGO) and $(Configuration) == 'PGUpdate' and $(PlatformToolset) == 'ClangCL'">%(AdditionalOptions) -fno-profile-instr-use</AdditionalOptions>
Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Since zlib-ng is compiled into a static lib, it won't produce any profile data. It is the first target that is built (after the_freeze_module)

<ItemGroup>
<!-- Static libraries for use later in the build-->
<ProjectsInclude="zlib-ng.vcxproj"Condition="$(zlibNgDir) != '' and Exists('$(zlibNgDir)\zlib-ng.h.in')" />

But since there is no\$(TargetName)_*.profraw for it
<ItemGroup>
<_profrawFilesInclude="$(OutDir)instrumented\$(TargetName)_*.profraw" />
</ItemGroup>

MergeClangProfileData won't createprofdata.profdata:
<TargetName="MergeClangProfileData"BeforeTargets="PrepareForBuild"
Condition="'$(SupportPGO)' and $(Configuration) == 'PGUpdate'"
Inputs="@(_profrawFiles)"

IMHO, this is the best fix (instead of just settingSupportPGO=false for it):

  • it still will get compiled with-fprofile-instr-generate
  • it will be linked withpython314.dll and (hopefully?) happily contribute its part topython314.profraw
  • MSVC doesn't createzlib-ng!1.pgc andzlib-ng.pgd either. No idea whether it gets PGOed there, but the same idea applies? The difference is just, that MSVC does not produce a hard error if the profile data isn't there, whereas clang does (TRACKEDEXEC : error : Error in reading profile profdata.profdata: no such file or directory), and I still have found no way to make that a warning :(

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Since zlib-ng is compiled into a static lib, it won't produce any profile data.

Thinking about it again, instead of doing the-fno-profile-instr-use trick in thePGUpdate phase, it would be better, to just create the lib once instead of twice (like I've recently done for the_freeze_module).
There is no need to build the lib twice (same holds true for the MSVC case)?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I gave it a quick try:

  • introducing aStaticLibProjects likeFreezeProjects inpcbuild.proj is simple and straight forward.
  • Then I'd have to make theProjectReference Include="zlib-ng.vcxproj conditional (otherwise it would still be built in thePGUpdate phase) - ok, doable.
  • but thenzlib.h is no longer found, because it is generated into$(IntDir)zlib.h - likewise$(IntDir)zlib-ng.h. So stopping here :)

So turns out to be not so easy and I'd like to keep that for a follow-up PR (if at all).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I assumed the call counts for PGO are going straight into each project that references it, since a static lib is essentially just a bundle of .obj files and so the link-time optimisation happens in the referencing project, not the static lib project.

Which means yes, it's participating in PGO, and it doesn't require rebuilding. Though it's not going to hurt that badly, so I'd tend towards decoupled build configuration (it would be easy to fix by moving the header file regeneration into pythoncore.vcxproj, but that's bad coupling).

Possibly we could makeSupportsPGO choose the definition ofIntDir and then it would usually be a quick rebuild?

chris-eibl reacted with thumbs up emoji
Copy link
MemberAuthor

@chris-eiblchris-eiblMar 24, 2025
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yeah, but let's do that in a follow-up PR, since it is not really related to clang-cl.

Technically, the/arch changes so that the resulting binary can run on legacy CPUs aren't either, but IMHO they are a nice fit here.

It would just put some rebase burden onto this PR, if we'd do it in a separate one.

@zooba
Copy link
Member

CI uses cmd instead if powershell, could that make a difference?

It probably requirescall PCbuild\build.bat ... to handle someone callingexit in the batch file. In Cmd, if the entire block is a batch file, thenexit will break out of the top-level one, not the nested one, unless you usecall.

@@ -1,3 +1,4 @@
// DO NOT MERGE - just triggering Windows tail-call CI)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Remember to remove

chris-eibl reacted with thumbs up emoji
@chris-eibl
Copy link
MemberAuthor

CI uses cmd instead if powershell, could that make a difference?

It probably requirescall PCbuild\build.bat ... to handle someone callingexit in the batch file. In Cmd, if the entire block is a batch file, thenexit will break out of the top-level one, not the nested one, unless you usecall.

I've created#131678 to continue the discussion there.

zooba reacted with thumbs up emoji

@zoobazooba merged commitd16f455 intopython:mainMar 24, 2025
38 checks passed
@chris-eiblchris-eibl deleted the fix_clangcl_zlibng branchMarch 24, 2025 19:29
diegorusso pushed a commit to diegorusso/cpython that referenced this pull requestApr 1, 2025
…honGH-131526)Do not enable AdvancedVectorExtensions2 for all *.c files, so that the resulting binary can be executed on older CPUs, too. Also enable AdvancedVectorExtensions512 where necessary, and add the ClangCL flags required to enable vector extensions.
seehwan pushed a commit to seehwan/cpython that referenced this pull requestApr 16, 2025
…honGH-131526)Do not enable AdvancedVectorExtensions2 for all *.c files, so that the resulting binary can be executed on older CPUs, too. Also enable AdvancedVectorExtensions512 where necessary, and add the ClangCL flags required to enable vector extensions.
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@zoobazoobazooba left review comments

@markshannonmarkshannonAwaiting requested review from markshannon

Assignees
No one assigned
Labels
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

3 participants
@chris-eibl@zooba@Fidget-Spinner

[8]ページ先頭

©2009-2025 Movatter.jp