Movatterモバイル変換


[0]ホーム

URL:


Skip to contentSkip to sidebar
/Blog
Try GitHub CopilotAttend GitHub Universe

Highlights from Git 2.50

The open source Git project just released Git 2.50. Here is GitHub’s look at some of the most interesting features and changes introduced since last time.

|
|13 minutes
  • Share:

The open source Git project justreleased Git 2.50 with features and bug fixes from 98 contributors, 35 of them new. We last caught up with you on the latest in Git back when2.49 was released.

💡 Before we get into the details of this latest release, we wanted to remind you thatGit Merge, the conference for Git users and developers is back this year on September 29-30, in San Francisco. Git Merge will feature talks from developers working on Git, and in the Git ecosystem. Tickets are on sale now; check outthe website to learn more.

With that out of the way, let’s take a look at some of the most interesting features and changes from Git 2.50.

Improvements for multiple cruft packs

When we coveredGit 2.43, we talked about newly added support formultiple cruft packs. Git 2.50 improves on that with better command-line ergonomics, and some important bugfixes. In case you’re new to the series, need a refresher, or aren’t familiar withcruft packs, here’s a brief overview:

Gitobjects may be either reachable or unreachable. The set of reachable objects is everything you can walk to starting from one of your repository’sreferences: traversing from commits to their parent(s), trees to their sub-tree(s), and so on. Any object that you didn’t visit by repeating that process over all of your references is unreachable.

InGit 2.37, Git introducedcruft packs, a new way to store your repository’s unreachable objects. A cruft pack looks like an ordinarypackfile with the addition of an.mtimes file, which is used to keep track of when each object was most recently written in order to determine when it is safe1 to discard it.

However, updating the cruft pack could be cumbersome–particularly in repositories with many unreachable objects–since a repository’s cruft pack must be rewritten in order to add new objects. Git 2.43 began to address this through a new command-line option:git repack --max-cruft-size. This option was designed to split unreachable objects across multiple packs, each no larger than the value specified by--max-cruft-size. But there were a couple of problems:

  • If you’re familiar withgit repack’s--max-pack-size option,--max-cruft-size’s behavior is quite confusing. The former option specifies the maximum size an individual pack can be, while the latter involves how and when to move objects between multiple packs.
  • The feature was broken to begin with! Since--max-cruft-sizealso imposes on cruft packs the same pack-size constraints as--max-pack-size does on non-cruft packs, it is often impossible to get the behavior you want.

For example, suppose you had two 100 MiB cruft packs and rangit repack --max-cruft-size=200M. You might expect Git to merge them into a single 200 MiB pack. But since--max-cruft-size also dictates the maximum size of the output pack, Git will refuse to combine them, or worse: rewrite the same pack repeatedly.

Git 2.50 addresses both of these issues with a new option:--combine-cruft-below-size. Instead of specifying the maximum size of the output pack, it determines which existing cruft pack(s) are eligible to be combined. This is particularly helpful for repositories that have accumulated many unreachable objects spread across multiple cruft packs. With this new option, you can gradually reduce the number of cruft packs in your repository over time by combining existing ones together.

With the introduction of--combine-cruft-below-size, Git 2.50 repurposed--max-cruft-size to behave as a cruft pack-specific override for--max-pack-size. Now--max-cruft-size only determines the size of the outgoing pack, not which packs get combined into it.

Along the way, a bug was uncovered that prevented objects stored in multiple cruft packs from being “freshened” incertain circumstances. In other words, some unreachable objects don’t have their modification times updated when they are rewritten, leading to them being removed from the repository earlier than they otherwise would have been. Git 2.50 squashes this bug, meaning that you can now efficiently manage multiple cruft packs and freshen their objects to your heart’s content.

[source,source]

Incremental multi-pack reachability bitmaps

​​Back inour coverage of Git 2.47, we talked about preliminary support forincremental multi-pack indexes. Multi-pack indexes (MIDXs) act like a single pack*.idx file for objects spread across multiple packs.

Multi-pack indexes are extremely useful to accelerate object lookup performance in large repositories by binary searching through a single index containing most of your repository’s contents, rather than repeatedly searching through each individual packfile. But multi-pack indexes aren’t just useful for accelerating object lookups. They’re also the basis for multi-pack reachability bitmaps, the MIDX-specific analogue of classic single-pack reachability bitmaps. If neither of those are familiar to you, don’t worry; here’s a brief refresher. Single-packreachability bitmaps store a collection ofbitmaps corresponding to a selection of commits. Each bit position in a pack bitmap refers to one object in that pack. In each individual commit’s bitmap, the set bits correspond to objects that are reachable from that commit, and the unset bits represent those that are not.

Multi-pack bitmaps were introduced to take advantage of the substantial performance increase afforded to us by reachability bitmaps. Instead of having bitmaps whose bit positions correspond to the set of objects in a single pack, a multi-pack bitmap’s bit positions correspond to the set of objects in a multi-pack index, which may include objects from arbitrarily many individual packs. If you’re curious to learn more about how multi-pack bitmaps work, you can read our earlier postScaling monorepo maintenance.

However, like cruft packs above, multi-pack indexes can be cumbersome to update as your repository grows larger, since each update requires rewriting the entire multi-pack index and its corresponding bitmap, regardless of how many objects or packs are being added. In Git 2.47, the file format for multi-pack indexes became incremental, allowing multiple multi-pack index layers to be layered on top of one another forming a chain of MIDXs. This made it much easier to add objects to your repository’s MIDX, but the incremental MIDX format at the time did not yet have support for multi-pack bitmaps.

Git 2.50 brings support for the multi-pack reachability format to incremental MIDX chains, with each MIDX layer having its own*.bitmap file. These bitmap layers can be used in conjunction with one another to provide reachability information about selected commits at any layer of the MIDX chain. In effect, this allows extremely large repositories to quickly and efficiently add new reachability bitmaps as new commits are pushed to the repository, regardless of how large the repository is.

This feature is still considered highly experimental, and support for repacking objects into incremental multi-pack indexes and bitmaps is still fairly bare-bones. This is an active area of development, so we’ll make sure to cover any notable developments to incremental multi-pack reachability bitmaps in this series in the future.

[source]

TheORT merge engine replacesrecursive

This release also saw some exciting updates related to merging. Way back when Git 2.33 was released, we talked about a new merge engine called “ORT” (standing for “Ostensibly Recursive’s Twin”).

ORT is a from-scratch rewrite of Git’s old merging engine, called “recursive.” ORT is significantly faster, more maintainable, and has many new features that were difficult to implement on top of its predecessor.

One of those features is the ability for Git to determine whether or not two things are mergeable without actually persisting any new objects necessary to construct the merge in the repository. Previously, the only way to tell whether two things are mergeable was to rungit merge-tree --write-tree on them. That works, but in this examplemerge-tree wrote any new objects generated by the merge into the repository. Over time, these can accumulate and cause performance issues. In Git 2.50, you can make the same determination without writing any new objects by usingmerge-tree’s new--quiet mode and relying on its exit code.

Most excitingly in this release is that ORT has entirely superseded recursive, and recursive is no longer part of Git’s source code. When ORT was first introduced, it was only accessible throughgit merge’s-s option to select a strategy. In Git 2.34, ORT becamethe default choice overrecursive, though the latter was still available in case there were bugs or behavior differences between the two. Now, 16 versions and two and a half years later, recursive has been completely removed from Git, with its author, Elijah Newren,writing:

As a wise man once told me, “Deleted code is debugged code!”

As of Git 2.50, recursive has been completelydebugged deleted. For more about ORT’s internals and its development, check out this five part series from Elijahhere,here,here,here, andhere.

[source,source,source]


  • If you’ve ever scripted around your repository’s objects, you are likely familiar withgit cat-file, Git’s purpose-built tool to list objects and print their contents.git cat-file has many modes, like--batch (for printing out the contents of objects), or--batch-check (for printing out certain information about objects without printing their contents).

    Oftentimes it is useful to dump the set of all objects of a certain type in your repository. For commits,git rev-list can easily enumerate a set of commits. But what about, say, trees? In the past, to filter down to just the tree objects from a list of objects, you might have written something like:

    $ git cat-file --batch-check='%(objecttype) %(objectname)' \
        --buffer <in | perl -ne 'print "$1\n" if /^tree ([0-9a-f]+)/'
    Git 2.50 brings Git’s object filtering mechanism used in partial clones togit cat-file, so the above can be rewritten a little more concisely like:
    $ git cat-file --batch-check='%(objectname)' --filter='object:type=tree' <in

    [source]

  • While we’re on the topic, let’s discuss a little-knowngit cat-file command-line option:--allow-unknown-type. This arcane option was used with objects that have a type other thanblob,tree,commit, ortag. This is a quirk dating back a little more thana decade ago that allowsgit hash-object to write objects with arbitrary types. In the time since, this feature has gotten very little use. In fact,git cat-file -p --allow-unknown-type can’t even print out the contents of one of these objects!

    $ oid="$(git hash-object -w -t notatype --literally /dev/null)"$ git cat-file -p $oidfatal: invalid object type

    This release makes the--allow-unknown-type option silently do nothing, and removes support from git hash-object to write objects with unknown types in the first place.

    [source]

  • Thegit maintenance command learned a number of new tricks this release as well. It can now perform a few new different kinds of tasks, likeworktree-prune,rerere-gc, andreflog-expire.worktree-prune mirrorsgit gc’s functionality to remove stale or broken Gitworktrees.rerere-gc also mirrors existing functionality exposed via git gc to expire oldrerere entries from previously recordedmerge conflict resolutions. Finally,reflog-expire can be used to remove stale unreachable objects from out of thereflog.

    git maintenance also ships with new configuration for the existingloose-objects task. This task removes lingering loose objects that have since been packed away, and then makes new pack(s) for any loose objects that remain. The size of those packs was previously fixed at a maximum of 50,000, and can now be configured by themaintenance.loose-objects.batchSize configuration.

    [source,source,source]

  • If you’ve ever needed to recover some work you lost, you may be familiar with Git’sreflog feature, which allows you to track changes to a reference over time. For example, you can go back and revisit earlier versions of your repository’s main branch by doinggit show main@{2} (to showmain prior to the two most recent updates) ormain@{1.week.ago} (to show where your copy of the branch was at a week ago).

    Reflog entries can accumulate over time, and you can reach forgit reflog expire in the event you need to clean them up. But how do you delete the entirety of a branch’s reflog? If you’re not yet running Git 2.50 and thought “surely it’sgit reflog delete”, you’d be wrong! Prior to Git 2.50, the only way to drop a branch’s entire reflog was to dogit reflog expire $BRANCH --expire=all.

    In Git 2.50, a newdrop sub-command was introduced, so you can accomplish the same as above with the much more naturalgit reflog drop $BRANCH.

    [source]

  • Speaking of references, Git 2.50 also received some attention to how references are processed and used throughout its codebase. When using the low-levelgit update-ref command, Git used to spend time checking whether or not the proposed refname could also be a valid object ID, making its lookups ambiguous. Sinceupdate-ref is such a low-level command, this check is no longer done, delivering some performance benefits to higher-level commands that rely onupdate-ref for their functionality.

    Git 2.50 also learned how to cache whether or not any prefix of a proposed reference name already exists (for example, you can’t create a referenceref/heads/foo/bar/baz if eitherrefs/heads/foo/bar orrefs/heads/foo already exists).

    Finally, in order to make those checks, Git used to create a new reference iterator for each individual prefix. Git 2.50’s reference backends learned how to “seek” existing iterators, saving time by being able to reuse the same iterator when checking each possible prefix.

    [source]

  • If you’ve ever had to tinker with Git’s low-levelcurl configuration, you may be familiar with Git’sconfiguration options for tuning HTTP connections, likehttp.lowSpeedLimit andhttp.lowSpeedTime which are used to terminate an HTTP connection that is transferring data too slowly.

    These options can be useful when fine-tuning Git to work in complex networking environments. But what if you want to tweak Git’sTCP Keepalive behavior? This can be useful to control when and how often to send keepalive probes, as well as how many to send, before terminating a connection that hasn’t sent data recently.

    Prior to Git 2.50, this wasn’t possible, but this version introduces three new configuration options:http.keepAliveIdle,http.keepAliveInterval, andhttp.keepAliveCount which can be used to control the fine-grained behavior of curl’s TCP probing (provided your operating system supports it).

    [source]

  • Git is famously portable and runs on a wide variety of operating systems and environments with very few dependencies. Over the years, various parts of Git have been written in Perl, including some commands likethe original implementation ofgit add -i . These days, very few remaining Git commands are written in Perl.

    This version reduces Git’s usage of Perl by removing it as a dependency of the test suite and documentation toolchain. Many Perl one-liners from Git’s test suite were rewritten to use other Shell functions or builtins, and some were rewritten as tiny C programs. For the handful of remaining hard dependencies on Perl, those tests will be skipped on systems that don’t have a working Perl.

    [source,source]

  • This release also shipped a minor cosmetic update togit rebase -i. When starting a rebase, your$EDITOR might appear with contents that look something like:

    pick c108101daa foopick d2a0730acf barpick e5291f9231 baz

    You can edit that list tobreak,reword, orexec (among many others), and Git will happily execute your rebase. But if you change the commit message in your rebase’s TODO script, they won’t actually change!

    That’s because the commit messages shown in the TODO script are just meant to help you identify which commits you’re rebasing. (If you want to rewrite any commit messages along the way, you can use thereword command instead). To clarify that these messages are cosmetic, Git will now prefix them with a# comment character like so:

    pick c108101daa # foopick d2a0730acf # barpick e5291f9231 # baz

    [source]

  • Long time readers of this series will recallour coverage of Git’sbundlefeature (when Git added support for partial bundles), though we haven’t covered Git’sbundle-urifeature. Git bundles are a way to package your repositories contents: both its objects and the references that point at them into a single*.bundle file.

    While Git has had support for bundles since as early asv1.5.1 (nearly 18 years ago!), itsbundle-uri feature ismuch newer. In short, thebundle-uri feature allows a server to serve part of a clone by first directing the client to download a*.bundle file. After the client does so, it will try to perform a fill-in fetch to gather any missing data advertised by the server but not part of the bundle.

    To speed up this fill-in fetch, your Git client will advertise any references that it picked up from the*.bundle itself. But in previous versions of Git, this could sometimes result inslower clones overall! That’s because up until Git 2.50, Git would only advertise the branches inrefs/heads/* when asking the server to send the remaining set of objects.

    Git 2.50 now includes advertises all references it knows about from the*.bundle when doing a fill-in fetch on the server, makingbundle-uri-enabled clones much faster.

    For more details about these changes, you can check outthis blog post from Scott Chacon.

    [source]

  • Last but not least,git add -p (andgit add -i) now work much more smoothly insparse checkouts by no longer having to expand thesparse index. This follows in a long line of work that has been gradually adding sparse-index compatibility to Git commands that interact with the index.

    Now you can interactively stage parts of your changes before committing in a sparse checkout without having to wait for Git to populate the sparsified parts of your repository’s index. Give it a whirl on your local sparse checkout today!

    [source]


The rest of the iceberg

That’s just a sample of changes from the latest release. For more, check out the release notes for2.50, orany previous version inthe Git repository.

🎉 Git turned 20 this year! Celebrate by watchingour interview of Linus Torvalds, where we discuss how it forever changed software development.

1 It’s nevertruly safe to remove an unreachable object from a Git repository that is accepting incoming writes, because marking an object as unreachable can race with incoming reference updates, pushes, etc. At GitHub, we use Git’s –expire-to feature (which we wrote about in ourcoverage of Git 2.39) in something we call “limbo repositories” to quickly recover objects that shouldn’t have been deleted, before deleting them for good.  ↩️


Tags:

Written by

Taylor Blau

Taylor Blau

@ttaylorr

Taylor Blau is a Staff Software Engineer at GitHub where he works on Git.

More onGit

Git turns 20: A Q&A with Linus Torvalds

To celebrate two decades of Git, we sat down with Linus Torvalds—the creator of Git and Linux—to discuss how it forever changed software development.

Highlights from Git 2.49

The open source Git project just released Git 2.49. Here is GitHub’s look at some of the most interesting features and changes introduced since last time.

Related posts

GitHub Copilot

For the Love of Code: a summer hackathon for joyful, ridiculous, and wildly creative projects

That idea you’ve been sitting on? The domain you bought at 2AM? A silly or serious side project? This summer, we invite you to build it — for the joy, for the vibes, For the Love of Code 🧡

Git

Git security vulnerabilities announced

Today, the Git project released new versions to address seven security vulnerabilities that affect all prior versions of Git.

Maintainers

4 trends shaping open source funding—and what they mean for maintainers

Get insights on the latest trends from GitHub experts while catching up on these exciting new projects.

Explore more from GitHub

Docs

Docs

Everything you need to master GitHub, all in one place.

Go to Docs
GitHub

GitHub

Build what’s next on GitHub, the place for anyone from anywhere to build anything.

Start building
Customer stories

Customer stories

Meet the companies and engineering teams that build with GitHub.

Learn more
Git Merge 2025

Git Merge 2025

As Git marks its 20th anniversary, join us September 29-30 to explore its impact, evolution, and future.

Get tickets

We do newsletters, too

Discover tips, technical guides, and best practices in our biweekly newsletter just for devs.


[8]ページ先頭

©2009-2025 Movatter.jp