Highlights from Git 2.44

The first Git release of 2024 is here! Take a look at some of our highlights on what’s new in Git 2.44.

February 23, 2024|Updated April 29, 2024

|9 minutes

The open source Git projectjust released Git 2.44 with features and bug fixes from over 85 contributors, 34 of them new. We last caught up with you on the latest in Git back when2.43 was released.

To celebrate this most recent release, here is GitHub’s look at some of the most interesting features and changes introduced since last time.

Faster pack generation with multi-pack reuse

If you’ve ever looked closely at Git’s output when pushing or pulling a repository to/from GitHub¹, you might have noticed thepack-reused number that appears at the end of your output, like so:

$ git clone git@github.com:git/git.gitCloning into 'git'...remote: Enumerating objects: 361232, done.remote: Counting objects: 100% (942/942), done.remote: Compressing objects: 100% (453/453), done.remote: Total 361232 (delta 598), reused 773 (delta 487), pack-reused 360290[...]

If you’ve ever looked at that number (above, this ispack-reused 360290), and wondered what it meant, then look no further!

In general terms, that number refers to how much of the pack GitHub was able to send by (more-or-less) streaming verbatim sections of a pack that already exists down to the cloner, instead of generating a new pack on the fly. When Git is sending objects to the client (when fetching/cloning), to the server (when pushing), or to itself (when repacking), Git needs to generate apackfile that contains the set of objects being transferred. For many of the objects in this pack, Git will locate those objects, open and parse them, then optionally try and pair them with some existing object to form adelta chain.

Repeating this process over all objects in the pack yields a more compact result, since Git will find and pair objects who have similar content to one another to save space. When pushing a small amount of data to GitHub, this search is usually negligible and doesn’t take a significant amount of time. But during a clone, loading and trying to re-delta-ify all of the reachable objects in a repository can become prohibitively expensive, especially when carried out over tens of thousands of clones or more.

To save time, Git takes a shortcut: because thewire format used to transfer objects uses the same representation as the.pack files on disk (in$GIT_DIR/objects/pack), it can reuse sections of an existing packfile byte-for-byte when generating the new pack to send down to the client.

In our above example, that’s exactly what happened: thepack-reused 360290 portion of our output indicated that GitHub was able to reuse 360,290 objects from disk without having to re-open and search for new deltas. That process was carried out only over the remaining objects (in this case, 361,232 less the reused quantity gives us just over 900 objects that took the slow path).

Verbatim pack-reuse sounds like a great deal, right? It is, but there are a couple of gotchas that impose a couple of restrictions on how often Git can make use of this optimization:

Packfiles cannot contain the same object more than once. For single-pack reuse, this is easy enough (since the pack we’re reusing from also can’t contain duplicate copies of an object), but it makes implementing multi-pack reuse difficult.
Certain kinds of deltas (which identify their base by the number of bytes between the delta and base) need to be “patched” if there is an omitted section between the delta and its base, changing the offset.

In order to take full advantage of verbatim pack-reuse, a repository needs to have a majority of its objects packed together in a single packfile. For many repositories, this isn’t a huge deal, but it can become prohibitively expensive for large repositories with many hundreds of millions of objects.

Git 2.44 ships with new support for reusing objects across multiple packs. When using a multi-pack index with reachability bitmaps (for more about these, check out our post,Scaling monorepo maintenance), Git can now take advantage of this optimization across multiple packs, eliminating the need to repack your repository into a single pack.

We’ll cover the precise details in a future blog post dedicated to multi-pack reuse. For now, you might notice a new line of output in your terminal the next time you push to GitHub:

$ git pushEnumerating objects: 350175, done.Counting objects: 100% (832/832), done.Compressing objects: 100% (132/132), done.Total 350175 (delta 735), reused 700 (delta 700), pack-reused 349343 (from 36)[...]

Notice instead of justpack-reused, we get an extra piece of information next to it ((from 36)), indicating the number of packs from which objects were reused.

To try this out yourself, upgrade your local installation of Git, and run

$ git config --global pack.allowPackReuse multi$ git multi-pack-index write --bitmap

before the next time you push to GitHub.

[source]

Faster rebases (and much more) with`git replay`

If you’ve read this series before, you’re no doubt familiar with our coverage ofmerge-ort, a recent development in Git that is a from-scratch rewrite of the merging backend. If you’re a newcomer to this series (first of all, welcome!), our coverage beginning in ourHighlights from Git 2.33 is a great place to start.

merge-ort was introduced almost a dozen Git versions ago and aimed to solve several long-standing issues with its predecessor, therecursive backend. The recursive backend was notoriously difficult to modify, and had difficulty performing well when dealing with merges that involve a large number of renames.

Themerge-ort backend was introduced to address these issues, by providing a structured implementation that was correct (with respect to the existing behavior, making it a drop-in replacement for the existing backend), performant, and easy to change. In Git 2.34 (for those interested,our coverage begins here),merge-ort became the default merging backend, meaning that if you’re running Git 2.34 or newer and don’t have any special configuration, you’re almost certainly already making use ofmerge-ort. Modern versions of Git use themerge-ort backend to resolve conflicts between files on either side of a merge or rebase. Withmerge-ort in place and widely used, merges and rebases could be computed significantly faster.

Butmerge-ort also makes it possible to compute merges and rebases without requiring that you have a fully populated checkout of your repository. To perform merges, themerge-tree command command used the--write-tree option to compute merges withmerge-ort without requiring a checked out version of your repository.

Rebases were a different story. The existinggit rebase sub-command comes with a lot of historical design decisions and assumptions that would make integrating it withmerge-ort less than straightforward, and would hinder performance without breaking backwards compatibility guarantees².

git replay exists to address these challenges. It offers an alternative togit rebase that, in addition to being far more performant:

Can operate in bare repositories.
Can rebase branches other than the currently checked-out one (in non-bare repositories).
Can operate over multiple branches simultaneously.

and much more. GitHub has been usingmerge-ort formore than a year to power all merges (and more recently, all rebases) performed on GitHub.com, and it has brought substantial performance improvements to both operations.

You might findgit replay useful if you’re scripting around in a repository, interested in eeking out performance gains relative togit rebase, or are just interested in playing around with the latest and greatest developments in the Git project. Regardless of which camp you’re in, you canlearn more aboutgit replay here.

[source]

While we’re on the topic of rebases, let’s talk about--autosquash. In case you’ve never used that option before, don’t worry; here’s a quick introduction. When rebasing, Git will try to combine commits whose subject line begins withfixup! [...],squash! [...], oramend! [...], where the[...] is the log message of some other commit. Git will pair these up and reorganize the todo list to put thefixup! [...] commits (etc.) next to their non-fixup! counterparts.
Depending on the verb, Git will either combine changes, alter the commit message, or merge successive commit messages together, allowing you to easily edit your work.
However, previous versions of Git only provided functionality for these options when using interactive rebases withgit rebase --interactive (or justgit rebase -i, for short). If you wrote afixup! commit (or similar) and wanted to quickly apply it at the right spot in history, you’d have to either: (a) rungit rebase -i and close your$EDITOR, or (b) runGIT_SEQUENCE_EDITOR=true git rebase -i.
In Git 2.44, autosquash-ing now works with non---interactive rebases, meaning that you can do a baregit rebase and apply yourfixup!‘s in their respective locations without having to inspect the todo list or munge yourGIT_SEQUENCE_EDITOR environment variable.
[source]
If you’ve been using Git for a long time (or are a newcomer), you’ve probably seen a message beginning withhint:, like so:
```
hint: Updates were rejected because the tag already exists in the remote. hint: Disable this message with "git config advice.pushAlreadyExists false"
```
Like the hint suggests, you can rungit config advice.pushAlreadyExists false to tell Git to avoid showing you the message. But what if you find the advice useful? Perhaps you want to be warned (for example) when attempting to push a tag without--force to a remote which already has a tag by that same name. When that’s the case, you likely don’t want to also see the “Disable this message with […]” portion of the hint.
In Git 2.44, you can now setgit config advice.pushAlreadyExists true to indicate that you want to receive that hint, and Git will continue to show it to you, suppressing the “Disable this message with […]” portion of the message.
[source]
Quick quiz: what does the--no-sort option do when given togit for-each-ref? If you thought, “surely it doesn’t list all references in a non-alphabetical order,” then congratulations, you’re a veteran Git user!
Despite its name--no-sort provided the output ofgit for-each-ref in a sorted order, making it unable to take advantage of certain optimizations that assume an arbitrary ordering.
For those interested in the technical details, you can learn more in the patches linked below. If you just want the numbers, you’re in luck: on my machine,git for-each-ref --no-sort outperforms a bog-standardgit for-each-ref by more than 20% on a repository with a large number of references.
[source]
If you’ve spent much time pursuing the Git documentation, you’ve likely encountered the term “pathspec”, and perhaps wondered what it meant. In Git parlance, “pathspec” roughly corresponds to “ways to limit filepaths” when used in conjunction with a Git command.
There are lots of examplesin the documentation, but some notable ones include:git show ':^Documentation/' (meaning, “show me the last commit, excluding any changes in the Documentation directory”),git show ':(icase)**/*sha256*' (meaning, “show me files with ‘sha256’ in their path, regardless of casing”), andgit show ':(attr:!binary)' (meaning, “show me files which do not have theirbinary attribute set via.gitattributes“).
In Git 2.44,git add now understands theattr pathspec magic, meaning that you can do things likegit add ':(attr:!binary)' to stage all text/non-binary files in the index.
Git 2.44 also introduces a new pathspec attribute, calledbuiltin_objectmode. This new pathspec magic allows filtering paths by their mode (for example,100644 for non-executable files,100755 for executable ones,160000 for submodules, etc.). Thebuiltin_ prefix indicates that you can use this pathspec magic without needing to set any values in your.gitattributes file(s), meaning that you can do things likegit add ':(builtin_objectmode=100755)' to add all executable files in your working copy.
[source,source]

The whole shebang

That’s just a sample of changes from the latest release. For more, check out the release notes for2.44, orany previous version inthe Git repository.

Notes

If you’re reading this blog post (especially the footnotes!) there’s a pretty good chance that you have. ↩
For those curious, an extensive discussion on whygit replay was used instead of extendinggit rebasecan be found on the mailing list here. ↩

Written by

Taylor Blau

@ttaylorr

Taylor Blau is a Staff Software Engineer at GitHub where he works on Git.

More onGit

Highlights from Git 2.50

The open source Git project just released Git 2.50. Here is GitHub’s look at some of the most interesting features and changes introduced since last time.

Taylor Blau