Movatterモバイル変換


[0]ホーム

URL:


Following system colour schemeSelected dark colour schemeSelected light colour scheme

Python Enhancement Proposals

PEP 385 – Migrating from Subversion to Mercurial

PEP 385 – Migrating from Subversion to Mercurial

Author:
Dirkjan Ochtman <dirkjan at ochtman.nl>,Antoine Pitrou <solipsis at pitrou.net>,Georg Brandl <georg at python.org>
Status:
Final
Type:
Process
Created:
25-May-2009

Table of Contents

Motivation

After having decided to switch to the Mercurial DVCS, the actualmigration still has to be performed. In the case of an importantpiece of infrastructure like the version control system for a large,distributed project like Python, this is a significant effort. ThisPEP is an attempt to describe the steps that must be taken for furtherdiscussion. It’s somewhat similar toPEP 347, which discussed themigration to SVN.

To make the most of hg, we would like to make a high-fidelityconversion, such that (a) as much of the svn metadata as possible isretained, and (b) all metadata is converted to formats that are commonin Mercurial. This way, tools written for Mercurial can be optimallyused. In order to do this, we want to use thehgsubversionsoftware to do an initial conversion. This hg extension is focused onproviding high-quality conversion from Subversion to Mercurial for usein two-way correspondence, meaning it doesn’t throw away as muchavailable metadata as other solutions.

Such a conversion also seems like a good time to reconsider thecontents of the repository and determine if some things are stillvaluable. In this spirit, the following sections also proposediscarding some of the older metadata.

Timeline

The current schedule for conversion milestones:

  • 2011-02-24: availability of a test repo at hg.python.org

    Test commits will be allowed (and encouraged) from all committers tothe Subversion repository. The test repository and all test commitswill be removed once the final conversion is done. The server-sidehooks will be installed for the test repository, in order to testbuildbot, diff-email and whitespace checking integration.

  • 2011-03-05: final conversion (tentative)

    Commits to the Subversion branches now maintained in Mercurial willbe blocked. Developers should refrain from pushing to the Mercurialrepositories until all infrastructure is ensured to work after theirswitch over to the new repository.

Transition plan

Branch strategy

Mercurial has two basic ways of using branches: cloned branches, whereeach branch is kept in a separate repository, and named branches,where each revision keeps metadata to note on which branch it belongs.The former makes it easier to distinguish branches, at the expense ofrequiring more disk space on the client. The latter makes it a littleeasier to switch between branches, but all branch names are apersistent part of history.[1]

Differences between named branches and cloned branches:

  • Tags in a different (maintenance) clone aren’t available in thelocal clone
  • Clones with named branches will be larger, since they contain moredata

We propose to use named branches for release branches and adopt clonedbranches for feature branches.

History management

In order to minimize the loss of information due to the conversion, wepropose to provide several repositories as a conversion result:

  • A repository trimmed to the mainline trunk (and py3k), as well aspast and present maintenance branches – this is called the“working” repo and is where development continues. This repository hasall the history needed for development work, including annotatingsource files with changes back up to 1990 and other common history-diggingoperations.

    Thedefault branch in that repo is what is known aspy3k inSubversion, while the Subversion trunk lives on with the branch namelegacy-trunk; however in Mercurial this branch will be closed.Release branches are named after their major.minor version, e.g.3.2.

  • A repository with the full, unedited conversion of the Subversionrepository (actually, its /python subdirectory) – this is calledthe “historic” or “archive” repo and will be offered as a read-onlyresource.[2]
  • One more repository per active feature branch; “active” means thatat least one core developer asks for the branch to be provided. Eachsuch repository will contain both the feature branch and all ancestorchangesets from mainline (coming fromtrunk and/orpy3k in SVN).

Since all branches are present in the historic repo, they can later beextracted as separate repositories at any time should it prove to benecessary.

The final revision map between SVN revision numbers, Mercurial changesetsand SVN branch names will be made available in a file stored in theMiscdirectory. Its format is as following:

[...]88483e65daae6cf4499a0863cb7645109a4798c28d83eissue10276-snowleopard88484835cb57abffeceaff0d85c2a3aa0625458dd3e31py3k88485d880f9d8492f597a030772c7485a34aadb6c4ecerelease32-maint884860c431b8c22f5dbeb591414c154acb7890c1809dfpy3k8848782cda1f21396bbd10db8083ea20146d296cb630brelease32-maint884888174d00d07972d6f109ed57efca8273a4d59302crelease27-maint[...]

Converting tags

The SVN tags directory contains a lot of old stuff. Some of these arenot, in fact, full tags, but contain only a smaller subset of therepository. All release tags will be kept; other tags will beincluded based on requests from the developer community. We proposeto make the tag naming scheme consistent, in this style:v3.2.1a2.

Author map

In order to provide user names the way they are common in hg (in the‘First Last <user@example.org>’ format), we need an author map to mapcvs and svn user names to real names and their email addresses. Wehave a complete version of such a map in the migration toolsrepository (not publicly accessible to avoid leaking addresses toharvesters). The email addresses in it might be out of date; that’sbound to happen, although it would be nice to try and have as manypeople as possible review it for addresses that are out of date. Thecurrent version also still seems to contain some encoding problems.

Generating .hgignore

The .hgignore file can be used in Mercurial repositories to helpignore files that are not eligible for version control. It does thisby employing several possible forms of pattern matching. The currentPython repository already includes a rudimentary .hgignore file tohelp with using the hg mirrors.

Since the current Python repository already includes a .hgignore file(for use with hg mirrors), we’ll just use that. Generating fullhistory of the file was debated but deemed impractical (because it’srelatively hard with fairly little gain, since ignoring is lessimportant for older revisions).

Repository size

A bare conversion result of the current Python repository weighs 1.9GB; although this is smaller than the Subversion repository (2.7 GB)it is not feasible.

The size becomes more manageable by the trimming applied to theworking repository, and by a process called “revlog reordering” thatoptimizes the layout of internal Mercurial storage very efficiently.

After all optimizations done, the size of the working repository isaround 180 MB on disk. The amount of data transferred over thenetwork when cloning is estimated to be around 80 MB.

Other repositories

There are a number of other projects hosted in svn.python.org’s“projects” repository. The “peps” directory will be converted alongwith the main Python one. Richard Tew has indicated that he’d like theStackless repository to also be converted. What other projects in thesvn.python.org repository should be converted?

There’s now an initial stab at converting the Jython repository. Thecurrent tip of hgsubversion unfortunately fails at some point.Pending investigation.

Other repositories that would like to converted to Mercurial canannounce themselves to me after the main Python migration is done, andI’ll take care of their needs.

Infrastructure

hg-ssh

Developers should access the repositories through ssh, similar to thecurrent setup. Public keys can be used to grant people access to ashared hg@ account. A hgwebdir instance also has been set up athg.python.org for easy browsing and read-only access. It isconfigured so that developers can trivially start new clones (forlonger-term features that profit from development in a separaterepository).

Also, direct creation of public repositories is allowed for core developers,although it is not yet decided which naming scheme will be enforced:

$ hg init ssh://hg@hg.python.org/sandbox/myworkrepo created, public URL is http://hg.python.org/sandbox/mywork

Hooks

A number of hooks is currently in use. The hg equivalents for theseshould be developed and deployed. The following hooks are being used:

  • check whitespace: a hook to reject commits in case the whitespacedoesn’t match the rules for the Python codebase. In a changegroup,only the tip is checked (this allows cleanup commits for changespulled from third-party repos). We can also offer a whitespace hookfor use with client-side repositories that people can use; it couldeither warn about whitespace issues and/or truncate trailingwhitespace from changed lines.
  • push mails: Emails will include diffs for each changeset pushedto the public repository, including the username which pushed thechangesets (this is not necessarily the same as the author recordedin the changesets).
  • buildbots: the python.org build master will be notified of each changesetpushed to thecpython repository, and will trigger an appropriate buildon every build slave for the branch in which the changeset occurs.

Thehooks repository contains ports of these server-side hooks toMercurial, as well as a couple additional ones:

  • check branch heads: a hook to reject pushes which create a new head onan existing branch. The pusher then has to merge the excess headsand try pushing again.
  • check branches: a hook to reject all changesets not on an allowed namedbranch. This hook’s whitelist will have to be updated when we want tocreate new maintenance branches.
  • check line endings: a hook, based on theeol extension, to reject allchangesets committing files with the wrong line endings. The commits thenhave to be stripped and redone, possibly with theeol extension enabledon the comitter’s computer.

One additional hook could be beneficial:

  • check contributors: in the current setup, all changesets bear theusername of committers, who must have signed the contributoragreement. We might want to use a hook to check if the committer isa contributor if we keep a list of registered contributors. Then,the hook might warn users that push a group of revisions containingchangesets from unknown contributors.

End-of-line conversions

Discussion about the lack of end-of-line conversion support inMercurial, which was provided initially by thewin32text extension,led to the development of the neweol extension that supports aversioned management of line-ending conventions on a file-by-filebasis, akin to Subversion’ssvn:eol-style properties. Thisinformation is kept in a versioned file called.hgeol, and such afile has already been checked into the Subversion repository.

A hook also exists on the server side to reject any changesetintroducing inconsistent newline data (see above).

hgwebdir

A more or less stock hgwebdir installation should be set up. We mightwant to come up with a style to match the Python website.

A small WSGI application has been written that can look upSubversion revisions and redirect to the appropriate hgweb page forthe given changeset, regardless in which repository the convertedrevision ended up (since one big Subversion repository is convertedinto several Mercurial repositories). It can also look up Mercurialchangesets by their hexadecimal ID.

roundup

By pointing Roundup to the URL of the lookup script mentioned above,links to SVN revisions will continue to work, and links to Mercurialchangesets can be created as well, without having to give repositoryand changeset ID.

After migration

Where to get code

After migration, the hgwebdir will live at hg.python.org. This is anaccepted standard for many organizations, and an easy parallel tosvn.python.org. The working repo might live athttp://hg.python.org/cpython/, for example, with the archive repo athttp://hg.python.org/cpython-archive/. For write access, developerswill have to use ssh, which could bessh://hg@hg.python.org/cpython/.

code.python.org was also proposed as the hostname. We think thatusing the VCS name in the hostname is good because it preventsconfusion: it should be clear that you can’t use svn or bzr forhg.python.org.

hgwebdir can already provide tarballs for every changeset. Thisobviates the need for daily snapshots; we can just point users totip.tar.gz instead, meaning they will get the latest. If desired, wecould even use buildbot results to point to the last good changeset.

Python-specific documentation

hg comes with good built-in documentation (available through hg help)and awiki that’s full of useful information and recipes, not tomention a popularbook (readable online).

In addition to that, the recently overhauledPython Developer’sGuide already has a branch with instructions for Mercurial insteadof Subversion; an onlinebuild of this branch is also available.

Proposed workflow

We propose two workflows for the migration of patches between severalbranches.

For migration within 2.x or 3.x branches, we propose a patch alwaysgets committed to the oldest branch where it applies first. Then, theresulting changeset can be merged using hg merge to all newer brancheswithin that series (2.x or 3.x). If it does not apply as-is to thenewer branch, hg revert can be used to easily revert to thenew-branch-native head, patch in some alternative version of the patch(or none, if it’s not applicable), then commit the merge. The premisehere is that all changesets from an older branch within the series areeventually merged to all newer branches within the series.

The upshot is that this provides for the most painless mergingprocedure. This means that in the general case, people have to thinkabout the oldest branch to which the patch should be applied beforeactually applying it. Usually, that is one of only two branches: thelatest maintenance branch and the trunk, except for security fixesapplicable to older branches in security-fix-only mode.

For merging bug fixes from the 3.x to the 2.7 maintenance branch (2.6and 2.5 are in security-fix-only mode and their maintenance willcontinue in the Subversion repository), changesets should betransplanted (not merged) in some other way. The transplantextension, import/export and bundle/unbundle work equally well here.

Choosing this approach allows 3.x not to carry all of the 2.xhistory-since-it-was-branched, meaning the clone is not as big and themerges not as complicated.

The future of Subversion

What happens to the Subversion repositories after the migration?Since the svn server contains a bunch of repositories, not just theCPython one, it will probably live on for a bit as not every projectmay want to migrate or it takes longer for other projects to migrate.To prevent people from staying behind, we may want to move migratedprojects from the repository to a new, read-only repository with a newname.

Build identification

Python currently provides the sys.subversion tuple to allow Pythoncode to find out exactly what version of Python it’s running against.The current version looks something like this:

  • (‘CPython’, ‘tags/r262’, ‘71600’)
  • (‘CPython’, ‘trunk’, ‘73128M’)

Another value is returned from Py_GetBuildInfo() in the C API, andavailable to Python code as part of sys.version:

  • ‘r262:71600, Jun 2 2009, 09:58:33’
  • ‘trunk:73128M, Jun 2 2009, 01:24:14’

I propose that the revision identifier will be the short version ofhg’s revision hash, for example ‘dd3ebf81af43’, augmented with ‘+’(instead of ‘M’) if the working directory from which it was built wasmodified. This mirrors the output of the hg id command, which isintended for this kind of usage. The sys.subversion value will alsobe renamed to sys.mercurial to reflect the change in VCS.

For the tag/branch identifier, I propose that hg will check for tagson the currently checked out revision, use the tag if there is one(‘tip’ doesn’t count), and uses the branch name otherwise.sys.subversion becomes

  • (‘CPython’, ‘v2.6.2’, ‘dd3ebf81af43’)
  • (‘CPython’, ‘default’, ‘af694c6a888c+’)

and the build info string becomes

  • ‘v2.6.2:dd3ebf81af43, Jun 2 2009, 09:58:33’
  • ‘default:af694c6a888c+, Jun 2 2009, 01:24:14’

This reflects that the default branch in hg is called ‘default’instead of Subversion’s ‘trunk’, and reflects the proposed new tagformat.

Mercurial also allows to find out the latest tag and the number ofchangesets separating the current changeset from that tag, allowing fora descriptive version string:

$ hg parent --template "{latesttag}+{latesttagdistance}-{node|short}\n"v3.2+37-4b5d0d260e72$ hg up 2.73316 files updated, 0 files merged, 379 files removed, 0 files unresolved$ hg parent --template "{latesttag}+{latesttagdistance}-{node|short}\n"v2.7.1+216-9619d21d8198

Footnotes

[1]
The Mercurial book discourages the use of named branches, butit is, in this respect, somewhat outdated. Named branches havegotten much easier to use since that comment was written, due toimprovements in hg.
[2]
Since the initial working repo is a subset of the archive repo,it would also be feasible to pull changes from the working repointo the archive repo periodically.

Copyright

This document has been placed in the public domain.


Source:https://github.com/python/peps/blob/main/peps/pep-0385.rst

Last modified:2025-02-01 08:59:27 GMT


[8]ページ先頭

©2009-2026 Movatter.jp