To assert that there are no regressions in thedevelopment and maintenancebranches, Python has a set of dedicated machines (calledbuildbotsorbuild workers) used for continuous integration. They span a number ofhardware/operating system combinations. Furthermore, each machine hostsseveralbuilders, one per active branch: when a new change is pushedto this branch on the publicGitHub repository, all corresponding builderswill schedule a new build to be run as soon as possible.
The build steps run by the buildbots are the following:
Check out the source tree for the change which triggered the build
Compile Python
Run the test suite usingstrenuous settings
Clean up the build tree
It is your responsibility, as a core developer, to check the automaticbuild results after you push a change to the repository. It is thereforeimportant that you get acquainted with the way these results are presented,and how various kinds of failures can be explained and diagnosed.
Please read this page in full. If your questions aren’t answered here and youneed assistance with the buildbots, a good way to get help is to either:
contact thepython-buildbots@python.org mailing list where all buildbotworker owners are subscribed; or
contact the release manager of the branch you have issues with.
Thebedevere-bot on GitHub will put a message on your merged Pull Requestif building your commit on a stable buildbot worker fails. Take care toevaluate the failure, even if it looks unrelated at first glance.
Not all failures will generate a notification since not all builds are executedafter each commit. In particular, reference leaks builds take several hours tocomplete so they are done periodically. This is why it’s important for you tobe able to check the results yourself, too.
To trigger buildbots on a pull request you need to be a CPython triager or acore team member. If you are not, ask someone to trigger them on your behalf.
The simplest way to trigger most buildbots on your PR is with the🔨 test-with-buildbots and🔨 test-with-refleak-buildbotslabels. (SeeLabels specific to PRs.)
These will run buildbots on the most recent commit. If you want to trigger thebuildbots again on a later commit, you’ll have to remove the label and add itagain.
If you want to test a pull request against specific platforms, you can triggerone or more build bots by posting a comment that begins with:
!buildbot regex-matching-target
For example to run both the iOS and Android build bot, you can use:
!buildbot ios|android
bedevere-bot will post a comment indicating which build bots, ifany, were matched. If none were matched, or you do not have thenecessary permissions to trigger a request, it will tell you that too.
The!buildbot comment will also only run buildbots on the most recentcommit. To trigger the buildbots again on a later commit, you will have torepeat the comment.
There are three ways of visualizing recent build results:
The Web interface for each branch athttps://www.python.org/dev/buildbot/,where the so-called “waterfall” view presents a vertical rundown of recentbuilds for each builder. When interested in one build, you’ll have toclick on it to know which commits it corresponds to. Note thatthe buildbot web pages are often slow to load, be patient.
The command-linebbreport.py client, which you can get fromhttps://code.google.com/archive/p/bbreport. Installing it is trivial: just addthe directory containingbbreport.py to your system path so thatyou can run it from any filesystem location. For example, if you wantto display the latest build results on the development (“main”) branch,type:
bbreport.py-q3.xThe buildbot “console” interface athttps://buildbot.python.org/all/This works best on a wide, high resolutionmonitor. Clicking on the colored circles will allow you to open a new pagecontaining whatever information about that particular build is of interest toyou. You can also access builder information by clicking on the builderstatus bubbles in the top line.
If you like IRC, having an IRC client open to the #python-dev-notifs channel onirc.libera.chat is useful. Any time a builder changes state (last buildpassed and this one didn’t, or vice versa), a message is posted to the channel.Keeping an eye on the channel after pushing a commits is a simple way to getnotified that there is something you should look in to.
Some buildbots are much faster than others. Over time, you will learn whichones produce the quickest results after a build, and which ones take thelongest time.
Also, when several commits are pushed in a quick succession in the samebranch, it often happens that a single build is scheduled for all thesecommits.
A subset of the buildbots are marked “stable”. They are taken into accountwhen making a new release. The rule is that all stable builders must be free ofpersistent failures when the release is cut. It is absolutelyvitalthat core developers fix any issue they introduce on the stable buildbots,as soon as possible.
This does not mean that other builders’ test results can be taken lightly,either. Some of them are known for having platform-specific issues thatprevent some tests from succeeding (or even terminating at all), butintroducing additional failures should generally not be an option.
Sometimes, while you have run thewhole test suite beforecommitting, you may witness unexpected failures on the buildbots. One sourceof such discrepancies is if different flags have been passed to the test runneror to Python itself. To reproduce, make sure you use the same flags as thebuildbots: they can be found out simply by clicking thestdio link forthe failing build’s tests. For example:
./python.exe-Wd-E-bb./Lib/test/regrtest.py-uall-rwW
Note
RunningLib/test/regrtest.py is exactly equivalent to running-mtest.
Sometimes the failure is even subtler, as it relies on the order in whichthe tests are run. The buildbotsrandomize test order (by using the-roption to the test runner) to maximize the probability that potentialinterferences between library modules are exercised; the downside is that itcan make for seemingly sporadic failures.
The--randseed option makes it easy to reproduce the exact randomizationused in a given build. Again, open thestdio link for the failing testrun, and check the beginning of the test output proper.
Let’s assume, for the sake of example, that the output starts with:
./python -Wd -E -bb Lib/test/regrtest.py -uall -rwW== CPython 3.3a0 (default:22ae2b002865, Mar 30 2011, 13:58:40) [GCC 4.4.5]== Linux-2.6.36-gentoo-r5-x86_64-AMD_Athlon-tm-_64_X2_Dual_Core_Processor_4400+-with-gentoo-1.12.14 little-endian== /home/buildbot/buildarea/3.x.ochtman-gentoo-amd64/build/build/test_python_29628Testing with flags: sys.flags(debug=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0, no_user_site=0, no_site=0, ignore_environment=1, verbose=0, bytes_warning=2, quiet=0)Using random seed 2613169[ 1/353] test_augassign[ 2/353] test_functoolsYou can reproduce the exact same order using:
./python-Wd-E-bb-mtest-uall-rwW--randseed2613169
It will run the following sequence (trimmed for brevity):
[ 1/353] test_augassign[ 2/353] test_functools[ 3/353] test_bool[ 4/353] test_contains[ 5/353] test_compileall[ 6/353] test_unicode
If this is enough to reproduce the failure on your setup, you can thenbisect the test sequence to look for the specific interference causing thefailure. Copy and paste the test sequence in a text file, then use the--fromfile (or-f) option of the test runner to run the exactsequence recorded in that text file:
./python-Wd-E-bb-mtest-uall-rwW--fromfilemytestsequence.txtIn the example sequence above, iftest_unicode had failed, you wouldfirst test the following sequence:
[ 1/353] test_augassign[ 2/353] test_functools[ 3/353] test_bool[ 6/353] test_unicode
And, if it succeeds, the following one instead (which, hopefully, shallfail):
[ 4/353] test_contains[ 5/353] test_compileall[ 6/353] test_unicode
Then, recursively, narrow down the search until you get a single pair oftests which triggers the failure. It is very rare that such an interferenceinvolves more thantwo tests. If this is the case, we can only wish yougood luck!
Note
You cannot use the-j option (for parallel testing) when diagnosingordering-dependent failures. Using-j isolates each test in apristine subprocess and, therefore, prevents you from reproducing anyinterference between tests.
While we try to make the test suite as reliable as possible, some tests donot reach a perfect level of reproducibility. Some of them will sometimesdisplay spurious failures, depending on various conditions. Here are commonoffenders:
Network-related tests, such astest_poplib,test_urllibnet, etc.Their failures can stem from adverse network conditions, or imperfectthread synchronization in the test code, which often has to run aserver in a separate thread.
Tests dealing with delicate issues such as inter-thread or inter-processsynchronization, or Unix signals:test_multiprocessing,test_threading,test_subprocess,test_threadsignals.
When you think a failure might be transient, it is recommended you confirm bywaiting for the next build. Still, even if the failure does turn out sporadicand unpredictable, the issue should be reported on the bug tracker; evenbetter if it can be diagnosed and suppressed by fixing the test’simplementation, or by making its parameters - such as a timeout - more robust.
See also