https://git.python.orgNote
CPython’s development process moved tohttps://github.com/python/cpythonon 2017-02-10.
This PEP outlines the steps required to migrate Python’s developmentprocess from Mercurial[3] as hosted athg.python.org[1] to Git[4] on GitHub[2]. Meetingthe minimum goals of this PEP should allow for the developmentprocess of Python to be as productive as it currently is, and meetingits extended goals should improve the development process from itsstatus quo.
In 2014, it became obvious that Python’s custom developmentprocess was becoming a hindrance. As an example, for an externalcontributor to submit a fix for a bug that eventually was committed,the basic steps were:
NEWS,ACKS, and “What’s New” document as necessaryThis is a very heavy, manual process for core developers. Even in thesimple case, you could only possibly skip the code review step, as youwould still need to build the documentation. This led to patcheslanguishing on the issue tracker due to core developers not beingable to work through the backlog fast enough to keep up withsubmissions. In turn, that led to a side-effect issue of discouragingoutside contribution due to frustration from lack of attention, whichis a dangerous problem for an open source project with no corporatebacking as it runs counter to having a viable future for the project.While allowing patches to be uploaded to bugs.python.org[5] ispotentially simple for an external contributor, it is as slow andburdensome as it gets for a core developer to work with.
Hence the decision was made in late 2014 that a move to a newdevelopment process was needed. A request for PEPsproposing new workflows was made, in the end leading to two:PEP 481 andPEP 507 proposing GitHub[2] andGitLab[7], respectively.
The year 2015 was spent off-and-on working on those proposals andtrying to tease out details of what made them different from eachother on the core-workflow mailing list[8].PyCon US 2015 also showed that the community was a bit frustratedwith our process due to both cognitive overhead for new contributorsand how long it was taking for core developers tolook at a patch (see the end of Guido van Rossum’skeynote at PyCon US 2015[9] as an example of thefrustration).
On January 1, 2016, the decision was made by Brett Cannon to move thedevelopment process to GitHub. The key reasons for choosing GitHubwere[10]:
There’s even already an unofficial logo to represent themigration to GitHub[22].
The overarching goal of this migration is to improve the developmentprocess to the extent that a core developer can go from externalcontribution submission through all the steps leading to committingsaid contribution from within a browser on a tablet with WiFiusingsome development process (this does not inherently meanGitHub’s default workflow). The final solution will also allowan external contributor to contribute even if they chose not to useGitHub (although there is not guarantee in feature parity).
While hg.python.org[1] hosts many repositories, there are onlyfive key repositories that need to move:
The devinabox repository is code-only.The peps and devguide repositories involve the generation of webpages.And the cpython repository has special requirements for integrationwith bugs.python.org[5].
The migration plan is separated into sections based on what isrequired to migrate the repositories listed in theRepositories to Migrate section. Completion of requirementsoutlined in each section should unblock the migration of the relatedrepositories. The sections are expected to be completed in order, butnot necessarily the requirements within a section.
Completion of the requirements in this section will allow thedevinabox repository to move to GitHub.
To manage permissions, a ‘Python core’ team will be created as part ofthe python organization[16]. Any repository that ismoved will have the ‘Python core’ team added to it with writepermissions[17]. Anyone who previously had rights tomanage SSH keys on hg.python.org will become a team maintainer for the‘Python core’ team.
Since moving to GitHub also entails moving to Git[4], we mustdecide what tools and commands we will run to translate a Mercurialrepository to Git. The tools developed specifically for this migrationare hosted athttps://github.com/orsenthil/cpython-hg-to-git .
A key part of any open source project is making sure that its sourcecode can be properly licensed. This requires making sure all peoplemaking contributions have signed a contributor license agreement(CLA)[18]. Up until now, enforcement of CLA signing ofcontributed code has been enforced by core developers checkingwhether someone had an* by their username onbugs.python.org[5]. With this migration, the plan is to startoff with automated checking and enforcement of contributors signingthe CLA.
To keep tracking of CLA signing under the direct control of the PSF,tracking who has signed the PSF CLA will be continued by marking thatfact as part of someone’s bugs.python.org user profile. What thismeans is that an association will be needed between a person’sbugs.python.org[5] account and their GitHub account, whichwill be done through a new field in a user’s profile. This doesimplicitly require that contributors will need both aGitHub[2] and bugs.python.org account in order to sign theCLA and contribute through GitHub.
An API is provided to query bugs.python.org to see if a GitHubusername corresponds to someone who has signed the CLA. Making a GETrequest to e.g.http://bugs.python.org/user?@template=clacheck&github_names=brettcannon,notanuserreturns a JSON dictionary with the keys of the usernames requestedand atrue value if they have signed the CLA,false if theyhave not, andnull if no corresponding GitHub username was found.
With an association between someone’s GitHub account and theirbugs.python.org[5] account, which has the data as to whethersomeone has signed the CLA, a bot can monitor pull requests onGitHub and denote whether the contributor has signed the CLA.
If the user has signed the CLA, the bot will add a positive label tothe issue to denote the pull request has no CLA issues (e.g., a greenlabel stating, “CLA signed”). If the contributor has not signed a CLA,a negative label will be added to the pull request will be blockedusing GitHub’s status API (e.g., a red label stating, “CLA not signed”).If a contributor lacks a bugs.python.org account, that will lead tothe negative label being used as well. Using a label for bothpositive and negative cases provides a fallback signal if thebot happens to fail, preventing potential false-positives orfalse-negatives. It also allows for an easy way to trigger the botagain by simply removing a CLA-related label (this is in contrast tousing a GitHub status check[40] which is onlytriggered on code changes).
As no pre-existing bot exists to meet our needs, it will be hosted onHeroku[39] and written to target Python 3.5 to act as ashowcase for asynchronous programming. The code for the bot is hostedin the Knights Who Say Ni project[41].
Updating.hg/hgrc in the now-old Mercurial repository in the[hooks]section with:
pretxnchangegroup.reject=echo" * This repo has been migrated to github.com/python/peps and does not accept new commits in Mercurial!"2>&1;exit1
will make the repository read-only.
Due to their use for generating webpages, thedevguide[14] and peps[13] repositories needtheir respective processes updated to pull from their new Gitrepositories.
Obviously the most active and important repository currently hostedat hg.python.org[1] is the cpythonrepository[15]. Because of its importance andhigh-frequency use, it requires more tooling before being moved to GitHubcompared to the other repositories mentioned in this PEP.
During the process of choosing a new development workflow, it wasdecided that a linear history is desired. People preferred having asingle commit representing a single change instead of having a set ofunrelated commits lead to a merge commit that represented a singlechange. This means that the convenient “Merge” button in GitHub pullrequests will be set to only dosquash commits and not mergecommits.
A second set of recommended commands will also be written forcommitting a contribution from a patch file uploaded tobugs.python.org[5]. This will obviously help keep the linearhistory, but it will need to be made to have attribution to the patchauthor.
The exact sequence of commands that will be given as guidelines tocore developers is an open issue:Git CLI commands for committing a pull request to cpython.
Historically, external contributions were attached to an issue onbugs.python.org[5] thanks to the fact that all externalcontributions were uploaded as a file. For changes committed by acore developer who committed a change directly, the specifying of anissue number in the commit message of the formatIssue# at thestart of the message led to a comment being posted to the issuelinking to the commit.
An association between a pull request and an issue is needed to trackwhen a fix has been proposed. The association needs to be many-to-oneas there can take multiple pull requests to solve a single issue(technically it should be a many-to-many association for when asingle fix solves multiple issues, but this is fairly rare and issuescan be merged into one using theSuperseder field on the issuetracker).
The association between a pull request and an issue will be done basedon detecting an issue number. If the issue is specified in either thetitle or in the body of a message on a pull request then a connectionwill be made on bugs.python.org[5]. Some visible notification– e.g. label or message – will be made to the pull request tonotify that the association was successfully made.
Once a commit is made, the corresponding issue should be updated toreflect this fact. This should work regardless of whether the commitcame from a pull request or a direct commit.
Currently you can usehttps://hg.python.org/lookup/ with a revisionID from either the Subversion or Mercurial copies of thecpython repo[15] to get redirected to the URL for thatrevision in the Mercurial repository. The URL rewriter will need tobe updated to redirect to the Git repository and to support the newrevision IDs created for the Git repository.
The most likely design is to statically know all the Mercurialchangeset numbers once the migration has occurred. The lookup codewill then be updated to accept hashes from 7 to 40 hexadecimal digits.Any hexadecimal of length 12 or 40 will be compared against theMercurial changeset numbers. If the number doesn’t match or is of someother length between 7 and 40 then it will be assumed to be a Git hash.
Thebugs.python.org commit number rewriterwill also need to be updated to accept hashes as short as 7 digits asGit will match on hashes that short or longer.
Once Python is no longer kept in Mercurial, thesys._mercurialattribute will need to be changed to return('CPython','','').An equivalentsys._git attribute will be added which fulfills thesame use-cases.
The devguide will need to be updated with details of the newworkflow. Mostly likely work will take place in a separate branchuntil the migration actually occurs.
The release process will need to be updated as necessary.
Once the cpython repository[15] is migrated, allrepositories will have been moved to GitHub[2] and thedevelopment process should be on equal footing as before the move. Buta key reason for this migration is to improve the development process,making it better than it has ever been. This section outlines someplans on how to improve things.
It should be mentioned that overall feature planning forbugs.python.org[5] – which includes plans independent of thismigration – are tracked on their own wiki page[23].
Traditionally theMisc/NEWS file[19] has beenproblematic for changes which spanned Python releases. Oftentimesthere will be merge conflicts when committing a change between e.g.,3.5 and 3.6 only in theMisc/NEWS file. It’s so common, in fact,that the example instructions in the devguide explicitly mention howto resolve conflicts in theMisc/NEWS file[21]. As part of our toolmodernization, working with theMisc/NEWS file will besimplified.
The planned approach is to use an individual file per news entry,containing the text for the entry. In this scenario, each featurerelease would have its own directory for news entries and a separatefile would be created in that directory that was either named afterthe issue it closed or a timestamp value (which prevents collisions).Merges across branches would have no issue as the news entry filewould still be uniquely named and in the directory of the latestversion that contained the fix. A script would collect all news entryfiles no matter what directory they reside in and create anappropriate news file (the release directory can be ignored as themere fact that the file exists is enough to represent that the entrybelongs to the release). Classification can either be done by keywordin the new entry file itself or by using subdirectories representingeach news entry classification in each release directory (orclassification of news entries could be dropped since criticalinformation is captured by the “What’s New” documents which areorganized). The benefit of this approach is that it keeps the changeswith the code that was actually changed. It also ties the message tobeing part of the commit which introduced the change. For a commitmade through the CLI, a script could be provided to help generate thefile. In a bot-driven scenario, the merge bot could have a way tospecify a specific news entry and create the file as part of itsflattened commit (while most likely also supporting using the firstline of the commit message if no specific news entry was specified).If a web-based workflow is used then a status check could be used toverify that a new entry file is in the pull request to act as areminder that the file is missing. Code for this approach has beenwritten previously for the Mercurial workflow athttp://bugs.python.org/issue18967. There is also tools from thecommunity likehttps://pypi.python.org/pypi/towncrier,https://github.com/twisted/newsbuilder, andhttp://docs.openstack.org/developer/reno/.
Discussions at the Sep 2016 Python core-dev sprints led to thisdecision compared to the rejected approaches outlined in theRejectedIdeas section of this PEP. The separate files approachseems to have the right balance of flexibility and potential toolingout of the various options while solving the motivating problem.
Work for this is being tracked athttps://github.com/python/core-workflow/issues/6.
Traditionally theMisc/ACKS file[20] has been managedby hand. But thanks to Git supporting anauthor value as well asacommitter value per commit, authorship of a commit can be partof the history of the code itself.
As such, manual management ofMisc/ACKS will become optional. Ascript will be written that will collect all author and committernames and merge them intoMisc/ACKS with all of the names listedprior to the move to Git. Running this script will become part of therelease process.
The script should also generate a list of all people who contributedsince the last execution. This will allow having a list of those whocontributed to a specific release so they can be explicitly thanked.
Work for this is being tracked athttps://github.com/python/core-workflow/issues/7.
https://git.python.orgJust as hg.python.org[1] currently points to the Mercurialrepository for Python, git.python.org should do the equivalent forthe Git repository.
Since GitHub[2] is going to be used for code hosting and codereview, those two things need to be backed up. In the case of codehosting, the backup is implicit as all non-shallow Git[4] clonescontain the full history of the repository, hence there will be manybackups of the repository.
The code review history does not have the same implicit backupmechanism as the repository itself. That means a daily backup of codereview history should be done so that it is not lost in case of anyissues with GitHub. It also helps guarantee that a migration fromGitHub to some other code review system is feasible were GitHub todisappear overnight.
Since the decision has been made to work with cherry-picks instead offorward merging of branches, it would be convenient to have a bot thatwould generate pull requests based on cherry-picking for any pullrequests that affect multiple branches. The most likely design is abot that monitors merged pull requests with key labels applied thatdelineate what branches the pull request should be cherry-picked into.The bot would then generate cherry-pick pull requests for each labeland remove the labels as the pull requests are created (this allowsfor easy detection when automatic cherry-picking failed).
Work for this is being tracked athttps://github.com/python/core-workflow/issues/8.
This would linearly apply accepted pull requests and verify that thecommits did not interfere with each other by running the test suiteand backing out commits if the test run failed. To help facilitatethe speed of testing, all patches committed since the last test runcan be applied at once under a single test run as the optimisticassumption is that the patches will work in tandem. Some mechanism tore-run the tests in case of test flakiness will be needed, whether itis from removing a “test failed” label, web interface for coredevelopers to trigger another testing event, etc.
Inspiration or basis of the bot could be taken from pre-existing botssuch as Homu[31] or Zuul[32].
The name given to this bot in order to give it commands is an openissue:Naming the bots.
There are various CI services that provide free support for opensource projects hosted on GitHub[2]. After experimentingwith a couple CI services, the decision was made to go withTravis[33].
The current CI service for Python is Pypatcher[38]. Arequest can be made in IRC to try a patch frombugs.python.org[5]. The results can be viewed athttps://ci.centos.org/job/cPython-build-patch/ .
Work for this is being tracked athttps://github.com/python/core-workflow/issues/1.
Getting an up-to-date test coverage report for Python’s standardlibrary would be extremely beneficial as generating such a report cantake quite a while to produce.
There are a couple pre-existing services that provide free testcoverage for open source projects. In the end, Codecov[37] waschosen as the best option.
Work for this is being tracked athttps://github.com/python/core-workflow/issues/2.
The current development process does not include notifying an issueon bugs.python.org[5] when a review comment is left onRietveld[6]. It would be nice to fix this so that peoplecan subscribe only to comments at bugs.python.org and notGitHub[2] and yet still know when something occurs on GitHubin terms of review comments on relevant pull requests. Currentthinking is to post a comment to bugs.python.org to the relevantissue when at least one review comment has been made over a certainperiod of time (e.g., 15 or 30 minutes, although with GitHub nowsupportingreviewsthe time aspect may be unnecessary). This keeps the email volumedown for those that receive both GitHub and bugs.python.org emailnotifications while still making sure that those only followingbugs.python.org know when there might be a review comment to address.
As of right now, bugs.python.org[5] allows people to log inusing Google, Launchpad, or OpenID credentials. It would be good toexpand this to GitHub credentials.
The content athttps://docs.python.org/,https://docs.python.org/devguide, andhttps://www.python.org/dev/peps/ are all derived from files kept inone of the repositories to be moved as part of this migration. Assuch, it would be nice to set up appropriate webhooks to triggerrebuilding the appropriate web content when the files they are basedon change instead of having to wait for, e.g., a cronjob to trigger.
This can partially be solved if the documentation is a Sphinx projectas then the site can have an unofficial mirror onRead the Docs, e.g.http://cpython-devguide.readthedocs.io/.
Work for this is being tracked athttps://github.com/python/core-workflow/issues/9.
It would be helpful for people who find issues with any of thedocumentation that is generated from a file to have a link on eachpage which points back to the file on GitHub[2] that storesthe content of the page. That would allow for quick pull requests tofix simple things such as spelling mistakes.
Work for this is being tracked athttp://bugs.python.org/issue28929.
While certain parts of the documentation athttps://docs.python.orgchange with the code, other parts are fairly static and are nottightly bound to the CPython code itself. The following sections ofthe documentation fit this category of slow-changing,loosely-coupled:
These parts of the documentation could be broken out into their ownrepositories to simplify their maintenance and to expand who hascommit rights to them to ease in their maintenance.
It has also been suggested to split out theWhat’s Newdocuments. That would require deciding whether a workflow could bedeveloped where it would be difficult to forget to updateWhat’s New (potentially through a label added to PRs, like“What’s New needed”).
While not necessary, it would be good to have official backups of thevarious Git repositories for disaster protection. It will be up tothe PSF infrastructure committee to decide if this is worthwhile orunnecessary.
The Python development team has long-standing guidelines forselecting new core developers. The key part of the guidelines is thata person needs to have contributed multiple patches which have beenaccepted and are high enough quality and size to demonstrate anunderstanding of Python’s development process. A bot could be writtenwhich tracks patch acceptance rates and generates a report to helpidentify contributors who warrant consideration for becoming coredevelopers. This work doesn’t even necessarily require GitHubintegration as long as the committer field in all git commits isfilled in properly.
Work is being tracked athttps://github.com/python/core-workflow/issues/10.
Requirements for migrating the devinabox[12]repository:
Repositories whose build steps need updating:
Required:
Optional features:
.github/CONTRIBUTING.md(to prevent PRs that are inappropriate from even showing up and pointing to the devguide)For this PEP, open issues are ones where a decision needs to be madeto how to approach or solve a problem. Open issues do not entailcoordination issues such as who is going to write a certain bit ofcode.
With the code repositories moving over to Git[4], there is notechnical need to keep hg.python.org[1] running. Having saidthat, some in the community would like to have it stay functioning asa Mercurial[3] mirror of the Git repositories. Others have saidthat they still want a mirror, but one using Git.
As maintaining hg.python.org is not necessary, it will be up to thePSF infrastructure committee to decide if they want to spend thetime and resources to keep it running. They may also choose whetherthey want to host a Git mirror on PSF infrastructure.
Depending on the decision reached, other ancillary repositories willeither be forced to migration or they can choose to simply stay onhg.python.org.
Because Git[4] may be a new version control system for coredevelopers, the commands people are expected to run will need to bewritten down. These commands also need to keep a linear history whilegiving proper attribution to the pull request author.
Another set of commands will also be necessary for when working witha patch file uploaded to bugs.python.org[5]. Here the linearhistory will be kept implicitly, but it will need to make sure tokeep/add attribution.
As naming things can lead to bikeshedding of epic proportions, BrettCannon will choose the final name of the various bots (the name ofthe project for the bots themselves can be anything, this is purelyfor the name used in giving commands to the bot or the account name).The names must come from Monty Python, which is only fitting sincePython is named after the comedy troupe.
It was discussed whether separate repositories for Python 2 andPython 3 were desired. The thinking was that this would shrink theoverall repository size which benefits people with slow Internetconnections or small bandwidth caps.
In the end it was decided that it was easier logistically to simplykeep all of CPython’s history in a single repository.
As the current development process has changes committed in theoldest branch first and then merged up to the default branch, thequestion came up as to whether this workflow should be perpetuated.In the end it was decided that committing in the newest branch andthen cherry-picking changes into older branches would work best asmost people will instinctively work off the newest branch and it is amore common workflow when using Git[4].
Cherry-picking is also more bot-friendly for an in-browser workflow.In the merge-up scenario, if you were to request a bot to do a mergeand it failed, then you would have to make sure to immediately solvethe merge conflicts if you still allowed the main commit, else youwould need to postpone the entire commit until all merges could behandled. With a cherry-picking workflow, the main commit couldproceed while postponing the merge-failing cherry-picks. This allowsfor possibly distributing the work of managing conflicting merges.
Lastly, cherry-picking should help avoid merge races. Currently, whenone is doing work that spans branches, it takes time to commit in theolder branch, possibly push to another clone representing thedefault branch, merge the change, and then push upstream.Cherry-picking should decouple this so that you don’t have to rushyour multi-branch changes as the cherry-pick can be done separately.
Misc/NEWS from the commit logsAs part of the discussion surroundingHandling Misc/NEWS, thesuggestion has come up of deriving the file from the commit logsitself. In this scenario, the first line of a commit message would betaken to represent the news entry for the change. Some heuristic totie in whether a change warranted a news entry would be used, e.g.,whether an issue number is listed.
This idea has been rejected due to some core developers preferring towrite a news entry separate from the commit message. The argument isthe first line of a commit message compared to that of a news entryhave different requirements in terms of brevity, what should be said,etc.
Misc/NEWS from bugs.python.orgA rejected solution to theNEWS file problem was to specify theentry on bugs.python.org[5]. This would mean an issue that ismarked as “resolved” could not be closed until a news entry is addedin the “news” field in the issue tracker. The benefit of tying thenews entry to the issue is it makes sure that all changes worthy of anews entry have an accompanying issue. It also makes classifying anews entry automatic thanks to the Component field of the issue. TheVersions field of the issue also ties the news entry to which Pythonreleases were affected. A script would be written to querybugs.python.org for relevant new entries for a release and to producethe output needed to be checked into the code repository. Thisapproach is agnostic to whether a commit was done by CLI or bot. Adrawback is that there’s a disconnect between the actual commit thatmade the change and the news entry by having them live in separateplaces (in this case, GitHub and bugs.python.org). This would meanmaking a commit would then require remembering to go back tobugs.python.org to add the news entry.
Misc/NEWS (https://hg.python.org/cpython/file/default/Misc/NEWS)Misc/ACKS (https://hg.python.org/cpython/file/default/Misc/ACKS)This document has been placed in the public domain.
Source:https://github.com/python/peps/blob/main/peps/pep-0512.rst
Last modified:2025-02-01 08:59:27 GMT