Note
This PEP has been withdrawn, if you’re looking for the PEPdocumenting the move to Github, please refer toPEP 512.
This PEP proposes migrating the repository hosting of CPython and thesupporting repositories to Git and Github. It also proposes adding Phabricatoras an alternative to Github Pull Requests to handle reviewing changes. Thisparticular PEP is offered as an alternative toPEP 474 andPEP 462 which aimsto achieve the same overall benefits but restricts itself to tools that supportMercurial and are completely Open Source.
CPython is an open source project which relies on a number of volunteersdonating their time. As an open source project it relies on attracting newvolunteers as well as retaining existing ones in order to continue to havea healthy amount of manpower available. In addition to increasing the amount ofmanpower that is available to the project, it also needs to allow for effectiveuse of what manpoweris available.
The current toolchain of the CPython project is a custom and unique combinationof tools which mandates a workflow that is similar to one found in a lot ofolder projects, but which is becoming less and less popular as time goes on.
The one-off nature of the CPython toolchain and workflow means that any newcontributor is going to need spend time learning the tools and workflow beforethey can start contributing to CPython. Once a new contributor goes throughthe process of learning the CPython workflow they also are unlikely to be ableto take that knowledge and apply it to future projects they wish to contributeto. This acts as a barrier to contribution which will scare off potential newcontributors.
In addition the tooling that CPython uses is under-maintained, antiquated,and it lacks important features that enable committers to more effectively usetheir time when reviewing and approving changes. The fact that it isunder-maintained means that bugs are likely to last for longer, if they everget fixed, as well as it’s more likely to go down for extended periods of time.The fact that it is antiquated means that it doesn’t effectively harness thecapabilities of the modern web platform. Finally the fact that it lacks severalimportant features such as a lack of pre-testing commits and the lack of anautomatic merge tool means that committers have to do needless busy work tocommit even the simplest of changes.
The first decision that needs to be made is the VCS of the primary server siderepository. Currently the CPython repository, as well as a number of supportingrepositories, uses Mercurial. When evaluating the VCS we must consider thecapabilities of the VCS itself as well as the network effect and mindshare ofthe community around that VCS.
There are really only two real options for this, Mercurial and Git. Between thetwo of them the technical capabilities are largely equivalent. For this reasonthis PEP will largely ignore the technical arguments about the VCS system andwill instead focus on the social aspects.
It is not possible to get exact numbers for the number of projects or peoplewhich are using a particular VCS, however we can infer this by looking atseveral sources of information for what VCS projects are using.
The Open Hub (previously Ohloh) statistics[1] show that 37% ofthe repositories indexed by The Open Hub are using Git (second only to SVNwhich has 48%) while Mercurial has just 2% (beating only bazaar which has 1%).This has Git being just over 18 times as popular as Mercurial on The Open Hub.
Another source of information on the popular of the difference VCSs is PyPIitself. This source is more targeted at the Python community itself since itrepresents projects developed for Python. Unfortunately PyPI does not have astandard location for representing this information, so this requires manualprocessing. If we limit our search to the top 100 projects on PyPI (orderedby download counts) we can see that 62% of them use Git while 22% of them useMercurial while 13% use something else. This has Git being just under 3 timesas popular as Mercurial for the top 100 projects on PyPI.
Obviously from these numbers Git is by far the more popular DVCS for opensource projects and choosing the more popular VCS has a number of positivebenefits.
For new contributors it increases the likelihood that they will have alreadylearned the basics of Git as part of working with another project or if theyare just now learning Git, that they’ll be able to take that knowledge andapply it to other projects. Additionally a larger community means more peoplewriting how to guides, answering questions, and writing articles about Gitwhich makes it easier for a new user to find answers and information aboutthe tool they are trying to learn.
Another benefit is that by nature of having a larger community, there will bemore tooling writtenaround it. This increases options for everything fromGUI clients, helper scripts, repository hosting, etc.
This PEP proposes allowing GitHub Pull Requests to be submitted, however GitHubdoes not have a way to submit Pull Requests against a repository that is nothosted on GitHub. This PEP also proposes that in addition to GitHub PullRequests Phabricator’s Differential app can also be used to submit proposedchanges and Phabricatordoes allow submitting changes against a repositorythat is not hosted on Phabricator.
For this reason this PEP proposes using GitHub as the canonical location ofthe repository with a read-only mirror located in Phabricator. If at some pointin the future GitHub is no longer desired, then repository hosting can easilybe moved to solely in Phabricator and the ability to accept GitHub PullRequests dropped.
In addition to hosting the repositories on Github, a read only copy of allrepositories will also be mirrored onto the PSF Infrastructure.
Currently CPython uses a custom fork of Rietveld which has been modified tonot run on Google App Engine which is really only able to be maintainedcurrently by one person. In addition it is missing out on features that arepresent in many modern code review tools.
This PEP proposes allowing both Github Pull Requests and Phabricator changesto propose changes and review code. It suggests both so that contributors canselect which tool best enables them to submit changes, and reviewers can focuson reviewing changes in the tooling they like best.
GitHub is a very popular code hosting site and is increasingly becoming theprimary place people look to contribute to a project. Enabling users tocontribute through GitHub is enabling contributors to contribute using toolingthat they are likely already familiar with and if they are not they are likelyto be able to apply to another project.
GitHub Pull Requests have a fairly major advantage over the older “submit apatch to a bug tracker” model. It allows developers to work completely withintheir VCS using standard VCS tooling so it does not require creating a patchfile and figuring out what the right location is to upload it to. This lowersthe barrier for sending a change to be reviewed.
On the reviewing side, GitHub Pull Requests are far easier to review, they havenice syntax highlighted diffs which can operate in either unified or side byside views. They allow expanding the context on a diff up to and including theentire file. Finally they allow commenting inline and on the pull request asa whole and they present that in a nice unified way which will also hidecomments which no longer apply. Github also provides a “rendered diff” viewwhich enables easily viewing a diff of rendered markup (such as rst) insteadof needing to review the diff of the raw markup.
The Pull Request work flow also makes it trivial to enable the ability topre-test a change before actually merging it. Any particular pull request canhave any number of different types of “commit statuses” applied to it, markingthe commit (and thus the pull request) as either in a pending, successful,errored, or failure state. This makes it easy to see inline if the pull requestis passing all of the tests, if the contributor has signed a CLA, etc.
Actually merging a Github Pull Request is quite simple, a core reviewer simplyneeds to press the “Merge” button once the status of all the checks on thePull Request are green for successful.
GitHub also has a good workflow for submitting pull requests to a projectcompletely through their web interface. This would enable the Pythondocumentation to have “Edit on GitHub” buttons on every page and people whodiscover things like typos, inaccuracies, or just want to make improvements tothe docs they are currently writing can simply hit that button and get an inbrowser editor that will let them make changes and submit a pull request allfrom the comfort of their browser.
In addition to GitHub Pull Requests this PEP also proposes setting up aPhabricator instance and pointing it at the GitHub hosted repositories. Thiswill allow utilizing the Phabricator review applications of Differential andAudit.
Differential functions similarly to GitHub pull requests except that theyrequire installing thearc command line tool to upload patches toPhabricator.
Whether to enable Phabricator for any particular repository can be chosen ona case-by-case basis, this PEP only proposes that it must be enabled for theCPython repository, however for smaller repositories such as the PEP repositoryit may not be worth the effort.
One feature that the current tooling (Mercurial, Rietveld) has is that theprimary language for all of the pieces are written in Python. It is this PEPsbelief that we should focus on thebest tools for the job and not thebesttools that happen to be written in Python. Volunteer time is a preciousresource to any open source project and we can best respect and utilize thattime by focusing on the benefits and downsides of the tools themselves ratherthan what language their authors happened to write them in.
One concern is the ability to modify tools to work for us, however one ofthe Goals here is tonot modify software to work for us and instead adaptourselves to a more standard workflow. This standardization pays off in theability to re-use tools out of the box freeing up developer time to actuallywork on Python itself as well as enabling knowledge sharing between projects.
However, if we do need to modify the tooling, Git itself is largely written inC the same as CPython itself is. It can also have commands written for it usingany language, including Python. Phabricator is written in PHP which is a fairlycommon language in the web world and fairly easy to pick up. GitHub itself islargely written in Ruby but given that it’s not Open Source there is no abilityto modify it so it’s implementation language is completely meaningless.
GitHub is a big part of this proposal and someone who tends more to ideologyrather than practicality may be opposed to this PEP on that grounds alone. Itis this PEPs belief that while using entirely Free/Open Source software is anattractive idea and a noble goal, that valuing the time of the contributors bygiving them good tooling that is well maintained and that they either alreadyknow or if they learn it they can apply to other projects is a more importantconcern than treating whether something is Free/Open Source is a hardrequirement.
However, history has shown us that sometimes benevolent proprietary companiescan stop being benevolent. This is hedged against in a few ways:
Relying on GitHub comes with a number of benefits beyond just the benefits ofthe platform itself. Since it is a commercially backed venture it has a full-timestaff responsible for maintaining its services. This includes making surethey stay up, making sure they stay patched for various securityvulnerabilities, and further improving the software and infrastructure as timegoes on.
Whether Mercurial or Git is better on a technical level is a highly subjectiveopinion. This PEP does not state whether the mechanics of Git or Mercurial isbetter and instead focuses on the network effect that is available for eitheroption. Since this PEP proposes switching to Git this leaves the people whoprefer Mercurial out, however those users can easily continue to work withMercurial by using the hg-git[2] extension for Mercurial which willlet it work with a repository which is Git on the serverside.
One sentiment that came out of previous discussions was that the multi branchmodel of CPython was too complicated for Github Pull Requests. It is the beliefof this PEP that statement is not accurate.
Currently any particular change requires manually creating a patch for 2.7 and3.x which won’t change at all in this regards.
If someone submits a fix for the current stable branch (currently 3.4) theGitHub Pull Request workflow can be used to create, in the browser, a PullRequest to merge the current stable branch into the master branch (assumingthere is no merge conflicts). If there is a merge conflict that would need tobe handled locally. This provides an improvement over the current situationwhere the merge must always happen locally.
Finally if someone submits a fix for the current development branch currentlythen this has to be manually applied to the stable branch if it desired toinclude it there as well. This must also happen locally as well in the newworkflow, however for minor changes it could easily be accomplished in theGitHub web editor.
Looking at this, I do not believe thatany system can hide the complexitiesinvolved in maintaining several long running branches. The only thing that thetooling can do is make it as easy as possible to submit changes.
One of the key ideas behind the move to both git and Github is that a featureof a DVCS, the repository hosting, and the workflow used is the social networkand size of the community using said tools. We can see this is true by lookingat an example from a sub-community of the Python community: The ScientificPython community. They have already migrated most of the key pieces of theSciPy stack onto Github using the Pull Request-based workflow. This processstarted with IPython, and as more projects moved over it became a naturaldefault for new projects in the community.
They claim to have seen a great benefit from this move, in that it enablescasual contributors to easily move between different projects within theirsub-community without having to learn a special, bespoke workflow and adifferent toolchain for each project. They’ve found that when people can usetheir limited time on actually contributing instead of learning the differenttools and workflows, not only do they contribute more to one project, butthat they also expand out and contribute to other projects. This move has alsobeen attributed to the increased tendency for members of that community to goso far as publishing their research and educational materials on Github aswell.
This example showcases the real power behind moving to a highly populartoolchain and workflow, as each variance introduces yet another hurdle for newand casual contributors to get past and it makes the time spent learning thatworkflow less reusable with other projects.
This document has been placed in the public domain.
Source:https://github.com/python/peps/blob/main/peps/pep-0481.rst
Last modified:2025-02-01 08:55:40 GMT