Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork32.1k
[WIP] Various build system improvements#101093
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Uh oh!
There was an error while loading.Please reload this page.
Conversation
Due to the holiday weekend in Korea (Lunar new year), I will review relevant PRs by next weekend. |
cc3f6f5
to2e2ba00
Compareaaupov commentedJan 18, 2023
Sidenote: BOLT now supports DWARF5, so there's no need to force https://github.com/python/cpython/blob/main/configure.ac#L1950-1952 |
Nice news! Thanks for let us know :) @indygreg, Let's drop the flag cc@aaupov |
2e2ba00
tofe8e296
CompareNo rush on the reviews,@corona10: I'm also busy celebrating the Lunar New Year this weekend. Some specific guidance I could use help with is how to split up / fold commits and how to associate them with a GitHub Issue. To be honest I find the whole CPython contribution process of managing GitHub Issues in addition to a PR to be confusing. Is there a 1:1 relationship? 1:n?https://devguide.python.org/ is a bit ambiguous on this topic. I'm lazy/busy, so prefer as little overhead / separate GitHub Issues as possible. But I can go along with whatever. If someone tells me what to do I'll do it. Also, work that I'd like to do but haven't started working on yet is enabling some of The end state I'd like to achieve is the ability for CPython's build system to build ~all extensions statically with minimal effort. e.g. |
corona10 commentedJan 24, 2023 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
One more thing, I have also interested in supporting build all extensions statically with minimal effort for internal purposes. |
What wrong with just editing |
@indygreg, are you planning on following up this PR? |
I could probably find time to polish off the PGO/BOLT changes if still wanted. Let's treat the broader extension module pieces I referred to as out of scope. BTW, one of the things the stack of commits in this PR does is it adds BOLT optimization to libpython. Current |
It would be really helpful if you could split this up in separate issues for each isolated change, and for each issue write a few sentences aboutwhy the change is needed, what value it adds, etc. Applying BOLT to libpython makes a lot of sense to me (based on your analysis), but I have no experience when it comes to BOLT :) |
It would be really helpful if you could split this up in separate issues for each isolated change, and for each issue write a few sentences about why the change is needed, what value it adds, etc. Yup. The PR summary called out that I was awaiting instructions/feedback on whether/how to proceed. I wasn’t sure if people were even interested in the refactors.All the details/justification are there in the commit messages. I just need to put in the time to turn the commits into issues/PRs. Maybe I’ll start later today… |
That's great, but it is easier to interact using GitHub Issues as opposed to comments on single commits. |
ff74a68
toc63a03d
CompareFirst 2 PRs from this branch are up:
Need to wait on publishing the next one because it depends on#104491 and I reckon GitHub's already brittle interface forstacking PRs will fall apart with all the automation CPython is using. So I'm not even going to try to do it. |
Various rules were only ever invoked once and had minimal bodies.I don't see a benefit to the indirection.So this commit inlines rules to simplify the PGO logic.skip news
This commit overhauls the make-based build system's rules forbuilding optimized binaries. Along the way it fixes a myriad ofbugs and shortcomings with the prior approach.The old way of producing optimized binaries had various limitations:* `make [all]` would do work when PGO was enabled because the phony `profile-opt` rule was non-empty. This prevented no-op PGO builds from working at all. This meant workflows like `make; make install` either incurred extra work or failed due to race conditions.* Same thing for BOLT, as its `bolt-opt` rule was also non-empty and always ran during `make [all]`.* BOLT could not be run multiple times without a full rebuild because `llvm-bolt` can't instrument binaries that have already received BOLT optimizations.* It was difficult to run BOLT on its own because of how various make targets and their dependencies were structured.* I found the old way that configure and make communicated the default targets to be confusing and hard to understand.There are essentially 2 major changes going on in this commit:1. A rework of the high-level make targets for performing a build and how they are defined.2. A rework of all the make logic related to profile-based optimization (read: PGO and BOLT).Build Target Rework======================Before, we essentially had `build_all`, `profile-opt`, `bolt-opt` and`build_wasm` as our 3 targets for performing a build. `all` would aliasto one of these, as appropriate.And there was another definition for which _simple_ make target toevaluate for non-optimized builds. This was likely `build_all` or`all`.In the rework, we introduce 2 new high-level targets:* `build-plain` - Perform a build without optimizations.* `build-optimized` - Perform a build with optimizations.`build-plain` is aliased to `build_all` in all configurations exceptWASM, where it is `build_wasm`.`build-optimized` by default is aliased to a target that prints an errormessage when optimizations aren't enabled. If PGO or BOLT are enabled,it is aliased to their respective target.`build-optimized` is the logical successor to `profile-opt`.I felt it best to delete `profile-opt` completely, as the new `build-*`high-level targets feel more friendly to use. But if people lament itsloss, we can add a `profile-opt: build-optimized` to achieve almost thesame result.Profiled-Based Optimization Rework==================================Most of the make logic related to profile-based optimization (read: PGOand BOLT) has been touched in this change.A major issue with the old way of doing things was we used phony,always-executed make rules. This is a bad practice in make because itundermines no-op builds.Another issue is that the separation between the rules and what orderthey ran in wasn't always clear. Both PGO and BOLT consist of the same4 phase solution: instrument, run, analyze, and apply. However, thesesteps weren't clearly expressed in the make logic. This is especiallytrue for BOLT, which only had 1 make rule.Another issue with BOLT is that it was really easy to get things intoa bad state. e.g. if you applied BOLT to `pythonX.Y` you could notrun BOLT again unless you rebuilt `pythonX.Y` from source.In the new world, we have separate `profile-<tool>-<stage>-stamp`rules defining the 4 distinct `instrument`, `run`, `analyze`, and`apply` stages for both PGO and BOLT. Each of these stages is trackedby a _stamp_ semaphore file so progress can be captured. This shouldall be pretty straightforward.There is some minimal complexity here to handle BOLT's optionaldependency on PGO, as BOLT either depends on `build_all` or`profile-pgo-apply-stamp`.As part of the refactor to BOLT we also preserve the original inputbinary before BOLT is applied. This original file is restored ifBOLT runs again. This greatly simplifies repeated BOLT invocations,as make doesn't perform needless work. However, this is all besteffort, as it is possible for some make target evaluations to stillget things in a bad state.Other Remarks=============If this change perturbs any bugs, they are likely around cleaningbehavior. The cleaning rules are a bit complicated and not clearlydocumented. And I'm unsure which targets CPython developers ofteniterate on. It is highly possible that state cleanup of PGO and/orBOLT files isn't as robust as it needs to be.I explicitly deleted some calls to PGO cleanup because those callsprevented no-op `make [all]` from working. It is certainly possiblesomething somewhere (release automation?) relied on these files beingdeleted when they no longer are. We still have targets to purge profilefiles and it should be trivial to add these to appropriate make rules.
This allows easily customizing the arguments to `llvm-bolt` withouthaving to edit the Makefile.Arguments can be passed to configure and are reflected in configureoutput, which can be useful for log analysis.When defined in the Makefile we use `?=` syntax so the flags can beoverridden via `make VAR=VALUE` syntax or via environment variables.Super useful for iterating on different BOLT flags.
This ensures `LD_LIBRARY_PATH` is set. Without this, I was able totickle a libpythonX.Y.so not found error.
Before, we only supported running BOLT on the main `python` binary.If a shared library was in play, it wouldn't be optimized. That wasleaving a ton of optimization opportunities on the floor.This commit adds support for running BOLT on libpython.Functionality is disabled by default because BOLT asserts on LLVM 15,which is the latest LLVM. I've built LLVM tip and it is able toprocess libpython just fine. So it is known to work.
The generated sysconfig data during builds encodes a PEP-425platform tag. During native (target=host) builds, the bootstrappedcompiler runs Python code in sysconfig to derive an appropriate value.For cross compiles, we fall back to logic in configure (that codelives around the changed lines) to derive an appropriate platformtag, which is exported as an environment variable during builds.And there is a "backdoor" in `sysconfig.py` that causes`sysconfig.get_platform()` to return that value.The logic in configure for deriving an appropriate platform tagis a far cry from what's in `sysconfig.py`. Ideally that logicwould be fully (re)implemented in configure. But that's anon-trivial amount of work.Recognizing that configure makes inadequate platform tag decisionsduring cross-compiles, this commit switches `_PYTHON_HOST_PLATFORM`from a regular output variable to a "precious variable" (in autoconfspeak). This has the side-effect of allowing invokers to define thevariable, effectively allowing them to explicitly set the platformtag during builds to a correct value when configure otherwise wouldn'tset one.
(This change is a quick and dirty way to merge some of the build systemimprovements I'm proposing inpythongh-101093 before the 3.12 feature freeze.I wanted to scope bloat myself to fix some longstanding deficiencies inthe build system around profile-guided builds. But I'm getting softresistance to the reviews so close to the freeze deadline and it isobvious that we need a simpler solution to hit the 3.12 deadline. Whilethis change is quick and dirty, it attempts to not make things worse.)Before this change, we only applied bolt to the main python binary.After this change, we apply bolt to libpython if it is configured. Inshared library builds, most of the C code is in libpython so it iscritical to apply bolt to libpython to realize bolt benefits.This change also reworks how bolt instrumentation is applied. Iteffectively removes the readelf based logic added inpythongh-101525 andreplaces it with a mechanism that saves a copy of the pre-bolt binaryand restores that copy when necessary. This allows us to performbolt optimizations without having to manually delete the output binaryto force a new bolt run.We also add a new make target for purging bolt files and hook it upto `clean` so bolt state is purged when appropriate.`.gitignore` rules have been added to ignore files related to bolt.Before and after this refactor, `make` will no-op after a previous run.Both versions should also share common make DAG deficiencies wheretargets fail to trigger as often as they need to or can triggerprematurely in certain scenarios. e.g. after this change you may needto `rm profile-bolt-stamp` to force a bolt run because there aren'tappropriate non-phony targets for bolt's make target to depend on.Fixing this is a non-trivial amount of work that will likely have towait until the 3.13 window.To make it easier to iterate on custom BOLT settings, the flags topass to instrumentation and application are now defined in configureand can be overridden by passing `BOLT_INSTRUMENT_FLAGS` and`BOLT_APPLY_FLAGS`.
(This change is a quick and dirty way to merge some of the build systemimprovements I'm proposing inpythongh-101093 before the 3.12 feature freeze.I wanted to scope bloat myself to fix some longstanding deficiencies inthe build system around profile-guided builds. But I'm getting softresistance to the reviews so close to the freeze deadline and it isobvious that we need a simpler solution to hit the 3.12 deadline. Whilethis change is quick and dirty, it attempts to not make things worse.)Before this change, we only applied bolt to the main python binary.After this change, we apply bolt to libpython if it is configured. Inshared library builds, most of the C code is in libpython so it iscritical to apply bolt to libpython to realize bolt benefits.This change also reworks how bolt instrumentation is applied. Iteffectively removes the readelf based logic added inpythongh-101525 andreplaces it with a mechanism that saves a copy of the pre-bolt binaryand restores that copy when necessary. This allows us to performbolt optimizations without having to manually delete the output binaryto force a new bolt run.We also add a new make target for purging bolt files and hook it upto `clean` so bolt state is purged when appropriate.`.gitignore` rules have been added to ignore files related to bolt.Before and after this refactor, `make` will no-op after a previous run.Both versions should also share common make DAG deficiencies wheretargets fail to trigger as often as they need to or can triggerprematurely in certain scenarios. e.g. after this change you may needto `rm profile-bolt-stamp` to force a bolt run because there aren'tappropriate non-phony targets for bolt's make target to depend on.Fixing this is a non-trivial amount of work that will likely have towait until the 3.13 window.To make it easier to iterate on custom BOLT settings, the flags topass to instrumentation and application are now defined in configureand can be overridden by passing `BOLT_INSTRUMENT_FLAGS` and`BOLT_APPLY_FLAGS`.
I'm creating this PR to track various build system changes that I'd like to make to CPython.
I'm creating a draft PR instead of opening real PRs because general OSS experience has taught me that if I trickle things out slowly without any prior context project maintainers won't see the big picture and/or benefits of the changes. My hope is that by tracking all proposed changes in a Git branch that people can easily diff that I'll have an easier time convincing folks to accept the patches.
Next Steps
CPython maintainer(s) should look at each commit in this branch. I write detailed commit messages. See each commit for more details.
Commits beginning with
gh-xxxx:
aren't yet assigned issues/PRs.Then, just tell me which commits to turn into issues/PRs and I'll do so. Keep in mind some commits are dependent on others. For example, the BOLT enhancements (which appear to deliver meaningful pyperformance wins) depends on the
overhaul build rules for optimized binaries
commit and its precursors.I have several more potential patches to include in this branch. Including a bunch that enable cross-compiling for Apple platforms. I figured I'd start with just the build system + BOLT improvements to get people's attention.
Most of this work is derived from my efforts onhttps://github.com/indygreg/python-build-standalone.
cc@corona10 since you seem to have an interest in all the BOLT PRs.