Python has been using a centralized version control system (VCS;first CVS, now Subversion) for years to great effect. Having a mastercopy of the official version of Python provides people with a singleplace to always get the official Python source code. It has alsoallowed for the storage of the history of the language, mostly forhelp with development, but also for posterity. And of course the V inVCS is very helpful when developing.
But a centralized version control system has its drawbacks. First andforemost, in order to have the benefits of version control withPython in a seamless fashion, one must be a “core developer” (i.e.someone with commit privileges on the master copy of Python). Peoplewho are not core developers but who wish to work with Python’srevision tree, e.g. anyone writing a patch for Python or creating acustom version, do not have direct tool support for revisions. Thiscan be quite a limitation, since these non-core developers cannoteasily do basic tasks such as reverting changes to a previouslysaved state, creating branches, publishing one’s changes with fullrevision history, etc. For non-core developers, the last safe treestate is one the Python developers happen to set, and this preventssafe development. This second-class citizenship is a hindrance topeople who wish to contribute to Python with a patch of anycomplexity and want a way to incrementally save their progress tomake their development lives easier.
There is also the issue of having to be online to be able to commitone’s work. Because centralized VCSs keep a central copy that storesall revisions, one must have Internet access in order for theirrevisions to be stored; no Net, no commit. This can be annoying ifyou happen to be traveling and lack any Internet. There is also thesituation of someone wishing to contribute to Python but having abad Internet connection where committing is time-consuming andexpensive and it might work out better to do it in a single step.
Another drawback to a centralized VCS is that a common use case isfor a developer to revise patches in response to review comments.This is more difficult with a centralized model because there’s noplace to contain intermediate work. It’s either all checked in ornone of it is checked in. In the centralized VCS, it’s also verydifficult to track changes to the trunk as they are committed, whileyou’re working on your feature or bug fix branch. This increasesthe risk that such branches will grow stale, out-dated, or thatmerging them into the trunk will generate too may conflicts to beeasily resolved.
Lastly, there is the issue of maintenance of Python. At any one timethere is at least one major version of Python under development (atthe time of this writing there are two). For each major version ofPython under development there is at least the maintenance versionof the last minor version and the in-development minor version (e.g.with 2.6 just released, that means that both 2.6 and 2.7 are beingworked on). Once a release is done, a branch is created between thecode bases where changes in one version do not (but could) belong inthe other version. As of right now there is no natural support forthis branch in time in central VCSs; you must use tools thatsimulate the branching. Tracking merges is similarly painful fordevelopers, as revisions often need to be merged between four activebranches (e.g. 2.6 maintenance, 3.0 maintenance, 2.7 development,3.1 development). In this case, VCSs such as Subversion only handlethis through arcane third party tools.
Distributed VCSs (DVCSs) solve all of these problems. While one cankeep a master copy of a revision tree, anyone is free to copy thattree for their own use. This gives everyone the power to commitchanges to their copy, online or offline. It also more naturallyties into the idea of branching in the history of a revision treefor maintenance and the development of new features bound forPython. DVCSs also provide a great many additional features thatcentralized VCSs don’t or can’t provide.
This PEP explores the possibility of changing Python’s use of Subversionto any of the currently popular DVCSs, in order to gainthe benefits outlined above. This PEP does not guarantee that a switchto a DVCS will occur at the conclusion of this PEP. It is quitepossible that no clear winner will be found and that svn will continueto be used. If this happens, this PEP will be revisited and revised inthe future as the state of DVCSs evolves.
Agreeing on a common terminology is surprisingly difficult,primarily because each VCS uses these terms when describing subtlydifferent tasks, objects, and concepts. Where possible, we try toprovide a generic definition of the concepts, but you should consultthe individual system’s glossaries for details. Here are some basicreferences for terminology, from some of the standard web-basedreferences on each VCS. You can also refer to glossaries for eachDVCS:
At the moment, the typical workflow for a Python core developer is:
It is a rather simple workflow, but it has drawbacks. For one,because any work that involves the repository takes time thanks tothe network, commits/pushes tend to not necessarily be as atomic aspossible. There is also the drawback of there not being anecessarily cheap way to create new checkouts beyond a recursivecopy of the checkout directory.
A DVCS would lead to a workflow more like this:
While there are more possible steps, the workflow is much moreindependent of the master repository than is currently possible. Bybeing able to commit locally at the speed of your disk, a coredeveloper is able to do atomic commits much more frequently,minimizing having commits that do multiple things to the code. Alsoby using a branch, the changes are isolated (if desired) from otherchanges being made by other developers. Because branches are cheap,it is easy to create and maintain many smaller branches that addressone specific issue, e.g. one bug or one new feature. Moresophisticated features of DVCSs allow the developer to more easilytrack long running development branches as the official mainlineprogresses.
| Name | Short Name | Version | 2.x Trunk Mirror | 3.x Trunk Mirror |
|---|---|---|---|---|
| Bazaar | bzr | 1.12 | http://code.python.org/python/trunk | http://code.python.org/python/3.0 |
| Mercurial | hg | 1.2.0 | http://code.python.org/hg/trunk/ | http://code.python.org/hg/branches/py3k/ |
| git | N/A | 1.6.1 | git://code.python.org/python/trunk | git://code.python.org/python/branches/py3k |
This PEP does not consider darcs, arch, or monotone. The mainproblem with these DVCSs is that they are simply not popular enoughto bother supporting when they do not provide some very compellingfeatures that the other DVCSs provide. Arch and darcs also havesignificant performance problems which seem unlikely to be addressedin the near future.
For those who have already decided which DVCSs they want to use, andare willing to maintain local mirrors themselves, all three DVCSssupport interchange via the git “fast-import” changeset format. gitdoes so natively, of course, and native support for Bazaar is underactive development, and getting good early reviews as of mid-February2009. Mercurial has idiosyncratic support for importing via itshgconvert command, andthird-party fast-import support is availablefor exporting. Also, theTailor tool supports automatic maintenanceof mirrors based on an official repository in any of the candidateformats with a local mirror in any format.
Probably the best way to help decide on whether/which DVCS shouldreplace Subversion is to see what it takes to perform somereal-world usage scenarios that developers (core and non-core) haveto work with. Each usage scenario outlines what it is, a bullet listof what the basic steps are (which can vary slightly per VCS), andhow to perform the usage scenario in the various VCSs(including Subversion).
Each VCS had a single author in charge of writing implementationsfor each scenario (unless otherwise noted).
| Name | VCS |
|---|---|
| Brett | svn |
| Barry | bzr |
| Alexandre | hg |
| Stephen | git |
Some DVCSs have some perks if you do some initial setup upfront.This section covers what can be done before any of the usagescenarios are run in order to take better advantage of the tools.
All of the DVCSs support configuring your project identification.Unlike the centralized systems, they use your email address toidentify your commits. (Access control is generally done bymechanisms external to the DVCS, such as ssh or console login).This identity may be associated with a full name.
All of the DVCSs will query the system to get some approximation tothis information, but that may not be what you want. They alsosupport setting this information on a per-user basis, and on aper-project basis. Convenience commands to set these attributes vary,but all allow direct editing of configuration files.
Some VCSs support end-of-line (EOL) conversions on checkout/checkin.
None required, but it is recommended you follow theguidelinesin the dev FAQ.
No setup is required, but for much quicker and space-efficient localbranching, you should create a shared repository to hold all yourPython branches. A shared repository is really just a parentdirectory containing a .bzr directory. When bzr commits a revision,it searches from the local directory on up the file system for a .bzrdirectory to hold the revision. By sharing revisions across multiplebranches, you cut down on the amount of disk space used. Do this:
cd~/projectsbzrinit-repopythoncdpython
Now, all your Python branches should be created inside of~/projects/python.
There are also some settings you can put in your~/.bzr/bazaar.confand~/.bzr/locations.conf file to set up defaults for interactingwith Python code. None of them are required, although some arerecommended. E.g. I would suggest gpg signing all commits, but thatmight be too high a barrier for developers. Also, you can set updefault push locations depending on where you want to push branchesby default. If you have write access to the master branches, thatpush location could be code.python.org. Otherwise, it might be afree Bazaar code hosting service such as Launchpad. If Bazaar ischosen, we should decide what the policies and recommendations are.
At a minimum, I would set up your email address:
bzrwhoami"Firstname Lastname <email.address@example.com>"
As with hg and git below, there are ways to set your email address (or really,just about any parameter) on aper-repository basis. You do this with settings in your$HOME/.bazaar/locations.conf file, which has an ini-style format as doesthe other DVCSs. See the Bazaar documentation for details,which mostly aren’t relevant for this discussion.
Minimally, you should set your user name. To do so, create the file.hgrc in your home directory and add the following:
[ui]username=FirstnameLastname<email.address@example.com>
If you are using Windows and your tools do not support Unix-style newlines,you can enable automatic newline translation by adding to your configuration:
[extensions]win32text=
These options can also be set locally to a given repository bycustomizing<repo>/.hg/hgrc, instead of~/.hgrc.
None needed. However, git supports a number of features that cansmooth your work, with a little preparation. git supports settingdefaults at the workspace, user, and system levels. The systemlevel is out of scope of this PEP. The user configuration file is$HOME/.gitconfig on Unix-like systems, and the workspaceconfiguration file is$REPOSITORY/.git/config.
You can use thegit-config tool to set preferences for user.name anduser.email either globally (for your system login account) orlocally (to a given git working copy), or you can edit theconfiguration files (which have the same format as shown in theMercurial section above).:
# my full name doesn't change# note "--global" flag means per user# (system-wide configuration is set with "--system")gitconfig--globaluser.name'Firstname Lastname'# but use my Pythonic email addresscd/path/to/python/repositorygitconfiguser.emailemail.address@python.example.com
If you are using Windows, you probably want to set the core.autocrlfand core.safecrlf preferences to true usinggit-config.:
# check out files with CRLF line endings rather than Unix-style LF onlygitconfig--globalcore.autocrlftrue# scream if a transformation would be ambiguous# (eg, a working file contains both naked LF and CRLF)# and check them back in with the reverse transformationgitconfig--globalcore.safecrlftrue
Although the repository will usually contain a .gitignore filespecifying file names that rarely if ever should be registered in theVCS, you may have personal conventions (e.g., always editing logmessages in a temporary file named “.msg”) that you may wish tospecify.:
# tell git where my personal ignores aregitconfig--globalcore.excludesfile~/.gitignore# I use .msg for my long commit logs, and Emacs makes backups in# files ending with ~# these are globs, not regular expressionsecho'*~'>>~/.gitignoreecho'.msg'>>~/.gitignore
If you use multiple branches, as with the other VCSes, you can save alot of space by putting all objects in a common object store. Thisalso can save download time, if the origins of the branches were indifferent repositories, because objects are shared across branches inyour repository even if they were not present in the upstreamrepositories. git is very space- and time-efficient and applies anumber of optimizations automatically, so this configuration isoptional. (Examples are omitted.)
As a non-core developer, I want to create and publish a one-off patchthat fixes a bug, so that a core developer can review it forinclusion in the mainline.
svncheckouthttp://svn.python.org/projects/python/trunkcdtrunk# Edit some code.echo"The cake is a lie!">README# Since svn lacks support for local commits, we fake it with patches.svndiff>>commit-1.diffsvndiff>>patch-1.diff# Upload the patch-1 to bugs.python.org.# Receive reviewer comments.# Edit some code.echo"The cake is real!">README# Since svn lacks support for local commits, we fake it with patches.svndiff>>commit-2.diffsvndiff>>patch-2.diff# Upload patch-2 to bugs.python.org
bzrbranchhttp://code.python.org/python/trunkcdtrunk# Edit some code.bzrcommit-m'Stuff I did'bzrsend-obundle# Upload bundle to bugs.python.org# Receive reviewer comments# Edit some codebzrcommit-m'Respond to reviewer comments'bzrsend-obundle# Upload updated bundle to bugs.python.org
Thebundle file is like a super-patch. It can be read bypatch(1) butit contains additional metadata so that it can be fed tobzrmerge toproduce a fully usable branch completely with history. SeePatch Reviewsection below.
hgclonehttp://code.python.org/hg/trunkcdtrunk# Edit some code.hgcommit-m"Stuff I did"hgoutgoing-p>fixes.patch# Upload patch to bugs.python.org# Receive reviewer comments# Edit some codehgcommit-m"Address reviewer comments."hgoutgoing-p>additional-fixes.patch# Upload patch to bugs.python.org
Whilehgoutgoing does not have the flag for it, most Mercurialcommands support git’s extended patch format through a--gitcommand. This can be set in one’s.hgrc file so that all commandsthat generate a patch use the extended format.
The patches could be created withgitdiffmaster>stuff-i-did.patch, too, butgitformat-patch|gitam knows some tricks(empty files, renames, etc) that ordinary patch can’t handle. gitgrabs “Stuff I did” out of the commit message to create the filename 0001-Stuff-I-did.patch. See Patch Review below for adescription of the git-format-patch format.
# Get the mainline code.gitclonegit://code.python.org/python/trunkcdtrunk# Edit some code.gitcommit-a-m'Stuff I did.'# Create patch for my changes (i.e, relative to master).gitformat-patchmastergittagstuff-v1# Upload 0001-Stuff-I-did.patch to bugs.python.org.# Time passes ... receive reviewer comments.# Edit more code.gitcommit-a-m'Address reviewer comments.'# Make an add-on patch to apply on top of the original.gitformat-patchstuff-v1# Upload 0001-Address-reviewer-comments.patch to bugs.python.org.
As a core developer, I want to undo a change that was not ready forinclusion in the mainline.
# Assume the change to revert is in revision 40svnmerge-c-40.# Resolve conflicts, if any.svncommit-m"Reverted revision 40"
# Assume the change to revert is in revision 40bzrmerge-r40..39# Resolve conflicts, if any.bzrcommit-m"Reverted revision 40"
Note that if the change you want revert is the last one that wasmade, you can just usebzruncommit.
# Assume the change to revert is in revision 9150dd9c6d30hgbackout--merge-r9150dd9c6d30# Resolve conflicts, if any.hgcommit-m"Reverted changeset 9150dd9c6d30"hgpush
Note, you can use “hg rollback” and “hg strip” to revert changes you committedin your local repository, but did not yet push to other repositories.
# Assume the change to revert is the grandfather of a revision tagged "newhotness".gitrevertnewhotness~2# Resolve conflicts if any. If there are no conflicts, the commit# will be done automatically by "git revert", which prompts for a log.gitcommit-m"Reverted changeset 9150dd9c6d30."gitpush
As a core developer, I want to review patches submitted by otherpeople, so that I can make sure that only approved changes are addedto Python.
Core developers have to review patches as submitted by other people.This requires applying the patch, testing it, and then tossing awaythe changes. The assumption can be made that a core developer alreadyhas a checkout/branch/clone of the trunk.
Subversion does not exactly fit into this development style very wellas there are no such thing as a “branch” as has been defined in thisPEP. Instead a developer either needs to create another checkout fortesting a patch or create a branch on the server. Up to this point,core developers have not taken the “branch on the server” approach todealing with individual patches. For this scenario the assumptionwill be the developer creates a local checkout of the trunk to workwith.:
cp-rtrunkissue0000cdissue0000patch-p0<__patch__# Review patch.svncommit-m"Some patch."cd..rm-rissue0000
Another option is to only have a single checkout running at any onetime and usesvndiff along withsvnrevert-R to store awayindependent changes you may have made.
bzrbranchtrunkissueNNNN# Download `patch` bundle from Roundupbzrmergepatch# Review patchbzrcommit-m'Patch NNN by So N. So'--fixespython:NNNNbzrpushbzr+ssh://me@code.python.org/trunkrm-rf../issueNNNN
Alternatively, since you’re probably going to commit these changes tothe trunk, you could just do a checkout. That would give you a localworking tree while the branch (i.e. all revisions) would continue tolive on the server. This is similar to the svn model and might allowyou to more quickly review the patch. There’s no need for the pushin this case.:
bzrcheckouttrunkissueNNNN# Download `patch` bundle from Roundupbzrmergepatch# Review patchbzrcommit-m'Patch NNNN by So N. So'--fixespython:NNNNrm-rf../issueNNNN
hgclonetrunkissue0000cdissue0000# If the patch was generated using hg export, the user name of the# submitter is automatically recorded. Otherwise,# use hg import --no-commit submitted.diff and commit with# hg commit -u "Firstname Lastname <email.address@example.com>"hgimportsubmitted.diff# Review patch.hgpushssh://alexandre@code.python.org/hg/trunk/
We assume a patch created by git-format-patch. This is a Unix mboxfile containing one or more patches, each formatted as anRFC 2822message. git-am interprets each message as a commit as follows. Theauthor of the patch is taken from the From: header, the date from theDate header. The commit log is created by concatenating the contentof the subject line, a blank line, and the message body up to thestart of the patch.:
cdtrunk# Create a branch in case we don't like the patch.# This checkout takes zero time, since the workspace is left in# the same state as the master branch.gitcheckout-bpatch-review# Download patch from bugs.python.org to submitted.patch.gitam<submitted.patch# Review and approve patch.# Merge into master and push.gitcheckoutmastergitmergepatch-reviewgitpush
As a core developer, I want to apply a patch to 2.6, 2.7, 3.0, and 3.1so that I can fix a problem in all three versions.
Thanks to always having the cutting-edge and the latest releaseversion under development, Python currently has four branches beingworked on simultaneously. That makes it important for a change topropagate easily through various branches.
Because of Python’s use of svnmerge, changes start with the trunk(2.7) and then get merged to the release version of 2.6. To get thechange into the 3.x series, the change is merged into 3.1, fixed up,and then merged into 3.0 (2.7 -> 2.6; 2.7 -> 3.1 -> 3.0).
This is in contrast to a port-forward strategy where the patch wouldhave been added to 2.6 and then pulled forward into newer versions(2.6 -> 2.7 -> 3.0 -> 3.1).
# Assume patch applied to 2.7 in revision 0000.cdrelease26-maintsvnmergemerge-r0000# Resolve merge conflicts and make sure patch works.svncommit-Fsvnmerge-commit-message.txt# revision 0001.cd../py3ksvnmergemerge-r0000# Same as for 2.6, except Misc/NEWS changes are reverted.svnrevertMisc/NEWSsvncommit-Fsvnmerge-commit-message.txt# revision 0002.cd../release30-maintsvnmergemerge-r0002svncommit-Fsvnmerge-commit-message.txt# revision 0003.
Bazaar is pretty straightforward here, since it supports cherrypicking revisions manually. In the example below, we could havegiven a revision id instead of a revision number, but that’s usuallynot necessary. Martin Pool suggests “We’d generally recommend doingthe fix first in the oldest supported branch, and then merging itforward to the later releases.”:
# Assume patch applied to 2.7 in revision 0000cdrelease26-maintbzrmerge../trunk-c0000# Resolve conflicts and make sure patch worksbzrcommit-m'Back port patch NNNN'bzrpushbzr+ssh://me@code.python.org/trunkcd../py3kbzrmerge../trunk-r0000# Same as for 2.6 except Misc/NEWS changes are revertedbzrrevertMisc/NEWSbzrcommit-m'Forward port patch NNNN'bzrpushbzr+ssh://me@code.python.org/py3k
Mercurial, like other DVCS, does not well support the currentworkflow used by Python core developers to backport patches. Rightnow, bug fixes are first applied to the development mainline(i.e., trunk), then back-ported to the maintenance branches andforward-ported, as necessary, to the py3k branch. This workflowrequires the ability to cherry-pick individual changes. Mercurial’stransplant extension provides this ability. Here is an example ofthe scenario using this workflow:
cdrelease26-maint# Assume patch applied to 2.7 in revision 0000hgtransplant-s../trunk0000# Resolve conflicts, if any.cd../py3khgpull../trunkhgmergehgrevertMisc/NEWShgcommit-m"Merged trunk"hgpush
In the above example, transplant acts much like the current svnmergecommand. When transplant is invoked without the revision, the commandlaunches an interactive loop useful for transplanting multiplechanges. Another useful feature is the –filter option which can beused to modify changesets programmatically (e.g., it could be usedfor removing changes to Misc/NEWS automatically).
Alternatively to the traditional workflow, we could avoidtransplanting changesets by committing bug fixes to the oldestsupported release, then merge these fixes upward to the more recentbranches.
cdrelease25-mainthgimportfix_some_bug.diff# Review patch and run test suite. Revert if failure.hgpushcd../release26-mainthgpull../release25-mainthgmerge# Resolve conflicts, if any. Then, review patch and run test suite.hgcommit-m"Merged patches from release25-maint."hgpushcd../trunkhgpull../release26-mainthgmerge# Resolve conflicts, if any, then review.hgcommit-m"Merged patches from release26-maint."hgpush
Although this approach makes the history non-linear and slightlymore difficult to follow, it encourages fixing bugs across allsupported releases. Furthermore, it scales better when there is manychanges to backport, because we do not need to seek the specificrevision IDs to merge.
In git I would have a workspace which contains all ofthe relevant master repository branches. git cherry-pick doesn’twork across repositories; you need to have the branches in the samerepository.
# Assume patch applied to 2.7 in revision release27~3 (4th patch back from tip).cdintegrationgitcheckoutrelease26gitcherry-pickrelease27~3# If there are conflicts, resolve them, and commit those changes.# git commit -a -m "Resolve conflicts."# Run test suite. If fixes are necessary, record as a separate commit.# git commit -a -m "Fix code causing test failures."gitcheckoutmastergitcherry-pickrelease27~3# Do any conflict resolution and test failure fixups.# Revert Misc/NEWS changes.gitcheckoutHEAD^--Misc/NEWSgitcommit-m'Revert cherry-picked Misc/NEWS changes.'Misc/NEWS# Push both ports.gitpushrelease26master
If you are regularly merging (rather than cherry-picking) from agiven branch, then you can block a given commit from beingaccidentally merged in the future by merging, then reverting it.This does not prevent a cherry-pick from pulling in the unwantedpatch, and this technique requires blocking everything that you don’twant merged. I’m not sure if this differs from svn on this point.
cdtrunk# Merge in the alpha tested code.gitmergeexperimental-branch# We don't want the 3rd-to-last commit from the experimental-branch,# and we don't want it to ever be merged.# The notation "^N" means Nth parent of the current commit. Thus HEAD^2^1^1# means the first parent of the first parent of the second parent of HEAD.gitrevertHEAD^2^1^1# Propagate the merge and the prohibition to the public repository.gitpush
Sometimes core developers end up working on a major feature withseveral developers. As a core developer, I want to be able topublish feature branches to a common public location so that I cancollaborate with other developers.
This requires creating a branch on a server that other developerscan access. All of the DVCSs support creating new repositories onhosts where the developer is already able to commit, withappropriate configuration of the repository host. This issimilar in concept to the existing sandbox in svn, although detailsof repository initialization may differ.
For non-core developers, there are various more-or-less public-accessrepository-hosting services.Bazaar hasLaunchpad,Mercurial hasbitbucket.org,and git hasGitHub.All also have easy-to-useCGI interfaces for developers who maintain their own servers.
# Create branch.svncopysvn+ssh://pythondev@svn.python.org/python/trunksvn+ssh://pythondev@svn.python.org/python/branches/NewHotnesssvncheckoutsvn+ssh://pythondev@svn.python.org/python/branches/NewHotnesscdNewHotnesssvnmergeinitsvncommit-m"Initialize svnmerge."# Pull in changes from other developers.svnupdate# Pull in trunk and merge to the branch.svnmergemergesvncommit-Fsvnmerge-commit-message.txt
This scenario is incomplete as the decision for what DVCS to go withwas made before the work was complete.
Sometimes, while working on an issue, it becomes apparent that theproblem being worked on is actually a compound issue of varioussmaller issues. Being able to take the current work and then beginworking on a separate issue is very helpful to separate out issuesinto individual units of work instead of compounding them into asingle, large unit.
To make up for svn’s lack of cheap branching, it has a changelistoption to associate a file with a single changelist. This is not aspowerful as being able to associate at the commit level. There isalso no way to express dependencies between changelists.
cp-rtrunkissue0000cdissue0000# Edit some code.echo"The cake is a lie!">READMEsvnchangelistAREADME# Edit some other code.echo"I own Python!">LICENSEsvnchangelistBLICENSEsvnci-m"Tell it how it is."--changelistB# Edit changelist A some more.svnci-m"Speak the truth."--changelistAcd..rm-rfissue0000
Here’s an approach that uses bzr shelf (now a standard part of bzr)to squirrel away some changes temporarily while you take a detour tofix the socket bugs.
bzrbranchtrunkbug-0000cdbug-0000# Edit some code. Dang, we need to fix the socket module.bzrshelve--all# Edit some code.bzrcommit-m"Socket module fixes"# Detour over, now resume fixing urllibbzrunshelve# Edit some code
Another approach uses the loom plugin. Looms cangreatly simplify working on dependent branches because theyautomatically take care of the stacking dependencies for you.Imagine looms as a stack of dependent branches (called “threads” inloom parlance), with easy ways to move up and down the stack ofthreads, merge changes up the stack to descendant threads, creatediffs between threads, etc. Occasionally, you may need or want toexport your loom threads into separate branches, either for reviewor commit. Higher threads incorporate all the changes in the lowerthreads, automatically.
bzrbranchtrunkbug-0000cdbug-0000bzrloomify--basetrunkbzrcreate-threadfix-urllib# Edit some code. Dang, we need to fix the socket module first.bzrcommit-m"Checkpointing my work so far"bzrdown-threadbzrcreate-threadfix-socket# Edit some codebzrcommit-m"Socket module fixes"bzrup-thread# Manually resolve conflicts if necessarybzrcommit-m'Merge in socket fixes'# Edit me some more codebzrcommit-m"Now that socket is fixed, complete the urllib fixes"bzrrecorddone
For bonus points, let’s say someone else fixes the socket module inexactly the same way you just did. Perhaps this person even grabbed yourfix-socket thread and applied just that to the trunk. You’d like tobe able to merge their changes into your loom and delete yournow-redundant fix-socket thread.
bzrdown-threadtrunk# Get all new revisions to the trunk. If you've done things# correctly, this will succeed without conflict.bzrpullbzrup-thread# See? The fix-socket thread is now identical to the trunkbzrcommit-m'Merge in trunk changes'bzrdiff-rthread:|wc-l# returns 0bzrcombine-threadbzrup-thread# Resolve any conflictsbzrcommit-m'Merge trunk'# Now our top-thread has an up-to-date trunk and just the urllib fix.
One approach is to use the shelve extension; this extension is not includedwith Mercurial, but it is easy to install. With shelve, you can select changesto put temporarily aside.
hgclonetrunkissue0000cdissue0000# Edit some code (e.g. urllib).hgshelve# Select changes to put aside# Edit some other code (e.g. socket).hgcommithgunshelve# Complete initial fix.hgcommitcd../trunkhgpull../issue0000hgmergehgcommitrm-rf../issue0000
Several other way to approach this scenario with Mercurial. Alexander Solovyovpresented a fewalternative approaches on Mercurial’s mailing list.
cdtrunk# Edit some code in urllib.# Discover a bug in socket, want to fix that first.# So save away our current work.gitstash# Edit some code, commit some changes.gitcommit-a-m"Completed fix of socket."# Restore the in-progress work on urllib.gitstashapply# Edit me some more code, commit some more fixes.gitcommit-a-m"Complete urllib fixes."# And push both patches to the public repository.gitpush
Bonus points: suppose you took your time, and someone else fixessocket in the same way you just did, and landed that in the trunk. Inthat case, your push will fail because your branch is not up-to-date.If the fix was a one-liner, there’s a very good chance that it’sexactly the same, character for character. git would notice that,and you are done; git will silently merge them.
Suppose we’re not so lucky:
# Update your branch.gitpullgit://code.python.org/public/trunkmaster# git has fetched all the necessary data, but reports that the# merge failed. We discover the nearly-duplicated patch.# Neither our version of the master branch nor the workspace has# been touched. Revert our socket patch and pull again:gitrevertHEAD^gitpullgit://code.python.org/public/trunkmaster
Like Bazaar and Mercurial, git has extensions to manage stacks ofpatches. You can use the original Quilt by Andrew Morton, or there isStGit (“stacked git”) which integrates patch-tracking for large setsof patches into the VCS in a way similar to Mercurial Queues or Bazaarlooms.
How doesPEP 101 change when using a DVCS?
It will change, but not substantially so. When doing themaintenance branch, we’ll just push to the new location instead ofdoing an svn cp. Tags are totally different, since in svn they aredirectory copies, but in bzr (and I’m guessing hg), they are justsymbolic names for revisions on a particular branch. The release.pyscript will have to change to use bzr commands instead. It’spossible that because DVCS (in particular, bzr) does cherry pickingand merging well enough that we’ll be able to create the maintbranches sooner. It would be a useful exercise to try to do arelease off the bzr/hg mirrors.
Clearly, details specific to Subversion inPEP 101 and in therelease script will need to be updated. In particular, releasetagging and maintenance branches creation process will have to bemodified to use Mercurial’s features; this will simplify andstreamline certain aspects of the release process. For example,tagging and re-tagging a release will become a trivial operationsince a tag, in Mercurial, is simply a symbolic name for a givenrevision.
It will change, but not substantially so. When doing themaintenance branch, we’ll just git push to the new location insteadof doing an svn cp. Tags are totally different, since in svn theyare directory copies, but in git they are just symbolic names forrevisions, as are branches. (The difference between a tag and abranch is that tags refer to a particular commit, and will neverchange unless you use git tag -f to force them to move. Thechecked-out branch, on the other hand, is automatically updated bygit commit.) The release.py script will have to change to use gitcommands instead. With git I would create a (local) maintenancebranch as soon as the release engineer is chosen. Then I’d “gitpull” until I didn’t like a patch, when it would be “git pull; gitrevert ugly-patch”, until it started to look like the sensible thingis to fork off, and start doing “git cherry-pick” on the goodpatches.
| DVCS | Windows | OS X | UNIX |
|---|---|---|---|
| bzr | yes (installer) w/ tortoise | yes (installer, fink or MacPorts) | yes (various package formats) |
| hg | yes (third-party installer) w/ tortoise | yes (third-party installer, fink or MacPorts) | yes (various package formats) |
| git | yes (third-party installer) | yes (third-party installer, fink or MacPorts) | yes (.deb or .rpm) |
As the above table shows, all three DVCSs are available on all threemajor OS platforms. But what it also shows is that Bazaar is theonly DVCS that directly supports Windows with a binary installerwhile Mercurial and git require you to rely on a third-party forbinaries. Both bzr and hg have a tortoise version while git does not.
Bazaar and Mercurial also has the benefit of being available in purePython with optional extensions available for performance.
bzrmvMailmanmailman) andas long as I did it on Linux (obviously), when I pulled in thechanges on OS X everything was hunky dory.In terms of code review tools such asReview Board andRietveld,the former supports all three while the latter supports hg and git butnot bzr. Bazaar does not yet have an online review board, but ithas several ways to manage email based reviews and trunk merging.There’sBundle Buggy,Patch Queue Manager (PQM), andLaunchpad’s code reviews.
All three have some web site online that provides basic hostingsupport for people who want to put a repository online. Bazaar hasLaunchpad, Mercurial has bitbucket.org, and git has GitHub. GoogleCode also has instructions on how to use git with the service, bothto hold a repository and how to act as a read-only mirror.
All three alsoappear to be supportedbyBuildbot.
| DVCS | svn support |
|---|---|
| bzr | bzr-svn (third-party) |
| hg | multiple third-parties |
| git | git-svn |
All three DVCSs have svn support, although git is the only one tocome with that support out-of-the-box.
| DVCS | Web page interface |
|---|---|
| bzr | loggerhead |
| hg | hgweb |
| git | gitweb |
All three DVCSs support various hooks on the client and server sidefor e.g. pre/post-commit verifications.
All three projects are under active development. Git seems to be on amonthly release schedule. Bazaar is on a time-released monthlyschedule. Mercurial is on a 4-month, timed release schedule.
Martin Pool adds: “bzr has a stable Python scripting interface, witha distinction between public and private interfaces and adeprecation window for APIs that are changing. Some plugins arelisted inhttps://edge.launchpad.net/bazaar andhttp://bazaar-vcs.org/Documentation”.
Alexander Solovyov comments:
Mercurial has easy to use extensive API with hooks for main eventsand ability to extend commands. Also there is the mq (mercurialqueues) extension, distributed with Mercurial, which simplifieswork with patches.
git has a cvsserver mode, ie, you can check out a tree from gitusing CVS. You can even commit to the tree, but features likemerging are absent, and branches are handled as CVS modules, whichis likely to shock a veteran CVS user.
As I (Brett Cannon) am left with the task of making the finaldecision of which/any DVCS to go with and not my co-authors, I feltit only fair to write down what tests I ran and my impressions as Ievaluate the various tools so as to be as transparent as possible.
The amount of time and effort it takes to get a checkout of Python’srepository is critical. If the difficulty or time is too great then aperson wishing to contribute to Python may very well give up. Thatcannot be allowed to happen.
I measured the checking out of the 2.x trunk as if I was a non-coredeveloper. Timings were done using thetime command in zsh andspace was calculated withdu-c-h.
| DVCS | San Francisco | Vancouver | Space |
|---|---|---|---|
| svn | 1:04 | 2:59 | 139 M |
| bzr | 10:45 | 16:04 | 276 M |
| hg | 2:30 | 5:24 | 171 M |
| git | 2:54 | 5:28 | 134 M |
When comparing these numbers to svn, it is important to realize thatit is not a 1:1 comparison. Svn does not pull down the entire revisionhistory like all of the DVCSs do. That means svn can perform aninitial checkout much faster than the DVCS purely based on the factthat it has less information to download for the network.
To see how the tools did for performing a command that requiredquerying the history, the log for theREADME file was timed.
| DVCS | Time |
|---|---|
| bzr | 4.5 s |
| hg | 1.1 s |
| git | 1.5 s |
One thing of note during this test was that git took longer than theother three tools to figure out how to get the log without it using apager. While the pager use is a nice touch in general, not having itautomatically turn on took some time (turns out the maingitcommand has a--no-pager flag to disable use of the pager).
I ended up trying to find out what the command was to see what URL therepository was cloned from. To do this I used nothing more than thehelp provided by the tool itself or its man pages.
Bzr was the easiest:bzrinfo. Runningbzrhelp didn’t showwhat I wanted, but mentionedbzrhelpcommands. That list had thecommand with a description that made sense.
Git was the second easiest. The commandgithelp didn’t show muchand did not have a way of listing all commands. That is when I viewedthe man page. Reading through the various commands I discoveredgitremote. The command itself spit out nothing more thanorigin.Tryinggitremoteorigin said it was an error and printed out thecommand usage. That is when I noticedgitremoteshow. Runninggitremoteshoworigin gave me the information I wanted.
For hg, I never found the information I wanted on my own. It turns outI wantedhgpaths, but that was not obvious from the descriptionof “show definition of symbolic path names” as printed byhghelp(it should be noted that reporting this in the PEP did lead to theMercurial developers to clarify the wording to make the use of thehgpaths command clearer).
To see how long it takes to update an outdated repository I timed bothupdating a repository 700 commits behind and 50 commits behind (threeweeks stale and 1 week stale, respectively).
| DVCS | 700 commits | 50 commits |
|---|---|---|
| bzr | 39 s | 7 s |
| hg | 17 s | 3 s |
| git | N/A | 4 s |
Note
Git lacks a value for the700 commits scenario as it doesnot seem to allow checking out a repository at a specificrevision.
Git deserves special mention for its output fromgitpull. Itnot only lists the delta change information for each file but alsocolor-codes the information.
At PyCon 2009 the decision was made to go with Mercurial.
While svn has served the development team well, it needs to beadmitted that svn does not serve the needs of non-committers as wellas a DVCS does. Because svn only provides its features such as versioncontrol, branching, etc. to people with commit privileges on therepository it can be a hindrance for people who lack commitprivileges. But DVCSs have no such limitation as anyone can create alocal branch of Python and perform their own local commits without theburden that comes with cloning the entire svn repository. Allowinganyone to have the same workflow as the core developers was the keyreason to switch from svn to hg.
Orthogonal to the benefits of allowing anyone to easily commit locallyto their own branches is offline, fast operations. Because hg storesall data locally there is no need to send requests to a serverremotely and instead work off of the local disk. This improvesresponse times tremendously. It also allows for offline usage for whenone lacks an Internet connection. But this benefit is minor andconsidered simply a side-effect benefit instead of a driving factorfor switching off of Subversion.
Git was not chosen for three key reasons (see thePyCon 2009lightning talk where BrettCannon lists these exact reasons; talk started at 3:45). First, git’sWindows support is the weakest out of the three DVCSs being consideredwhich is unacceptable as Python needs to support development on anyplatform it runs on. Since Python runs on Windows and some people dodevelop on the platform it needs solid support. And while git’ssupport is improving, as of this moment it is the weakest by a largeenough margin to warrant considering it a problem.
Second, and just as important as the first issue, is that the Pythoncore developers liked git the least out of the three DVCS options by awide margin. If you look at the following table you will see theresults of a survey taken of the core developers and how by a largemargin git is the least favorite version control system.
| DVCS | ++ | equal | – | Uninformed |
|---|---|---|---|---|
| git | 5 | 1 | 8 | 13 |
| bzr | 10 | 3 | 2 | 12 |
| hg | 15 | 1 | 1 | 10 |
Lastly, all things being equal (which they are notas shown by the previous two issues), it is preferable touse and support a tool written in Python and not one written in C andshell. We are pragmatic enough to not choose a tool simply because itis written in Python, but we do see the usefulness in promoting toolsthat do use it when it is reasonable to do so as it is in this case.
As for why Mercurial was chosen over Bazaar, it came down topopularity. As the core developer survey shows, hg was preferred overbzr. But the community also appears to prefer hg as was shown at PyConafter git’s removal from consideration was announced. Many people cameup to Brett and said in various ways that they wanted hg to be chosen.While no one said they did not want bzr chosen, no one said they dideither.
Based on all of this information, Guido and Brett decided Mercurialwas to be the next version control system for Python.
PEP 385 outlines the transition from svn to hg.
This document has been placed in the public domain.
Source:https://github.com/python/peps/blob/main/peps/pep-0374.rst
Last modified:2025-02-01 08:59:27 GMT