jj-vcs/jjPublic

NotificationsYou must be signed in to change notification settings
Fork842
Star23.1k

Rebase is getting fast!#49

martinvonz started this conversation inShow and tell

martinvonz

Dec 19, 2021

· 2 comments· 3 replies

Return to top

Discussion options

martinvonz
Dec 19, 2021
Maintainer

I made some performance improvements recently (particularly94e03f5,f084a05,c058ffe), which made rebasing reasonably fast in most cases.

For example, rebasing 1272 commits in the git.git repo now takes 885 ms. That was a rebase that didn't rewrite any trees (I ranjj describe v2.33.0 -m modified to set the description of that commit). I can't directly compare that togit's performance since it doesn't rebase merge commits correctly and would surely run into lots of conflicts (though I didn't even try).

Changing the description of v1.8.5 in the git.git repo results in >30k commits getting rebased. That takes ~16 seconds.

For a more direct comparison, I tried rewriting git.git's "todo" branch. Rebasing the 1351 linear commits in ee2122992937..todo there took 644 ms withjj and 75.9 s withgit (99.7 s with-s ort). If I add a file in that branch so the trees need to be rewritten (jj co ee2122992937; echo a > new-file; time jj squash), the rebase slows down to 1.39 s, which is still pretty reasonable.

Rebasing is still somewhat slow on merge commits when trees need to be merged. For example, rebasing the 505 descendants of v2.34.0 in the git.git repo takes almost 13 seconds. The vast majority of that time is spent on merge commits. The problem is that merge commits are rebased by first re-merging their parents and then calculating the difference compared to that. I plan to fix that by keeping the re-merged parent tree in some persistent cache. That cache will also be used for e.g.jj diff -r <merge commit> so the parents don't need to be re-merged.

You must be logged in to vote

Replies: 2 comments 3 replies

Comment options

martinvonz
Dec 19, 2021
Maintainer Author

@Alphare: I know you told me to remember that speed is a feature when I presented this project to you a long time ago :)

You must be logged in to vote

1 reply

Comment options

Alphare Dec 21, 2021

I'm glad to see speed is a first-class concern! The numbers sure sound nice. :)

Comment options

martinvonz
Jan 7, 2022
Maintainer Author

@newren: I know you've spent a lot of time on Git's "ort" strategy. Some replies to your merge-tree patches today made me think that you might be interested in the above. I assume Git can eventually get equally fast. I actually thought-s ort would achieve a similar speed already. Any idea why it doesn't? It doesn't touch the working copy or the index, right?

To be clear, I don't have support forrename detection, so I won't pretend thatjj rebase can be used as a replacement forgit rebase.

You must be logged in to vote

2 replies

Comment options

newren Jan 7, 2022

The rebase code in Git, found in sequencer.c + builtin/rebase.c, still has various vestiges of its transliteration from shell script. One of those is that the code assumes at every step that one has a full working tree and index. (Although, to be fair, the merge machinery it was using at the time it was written was designed such that worktree and index updates were intrinsic to the code and shotgun blasted everywhere, so assuming the merge would update the working tree wasn't a bad assumption back then.) Thus, it couldn't actually make use of the merge-ort API that provides just a toplevel tree object without significant rework; it needs there to be a full working tree and index at each step of the operation. So, for each commit being rebased, after it calls the merge-ort API for just computing the merge result and getting a new toplevel tree, it then immediately calls the API for updating the working tree and index.

Also related to this, is the fact that it writes out all the control files to disk after every commit, rather than waiting for it to hit a conflict. You'd think it'd only need a few control files under .git/rebase-merge/ to allow resuming from hitting a conflict, but that's thinking in terms of what makes sense from a clean implementation. Rebase was originally a shell script, and parsing files in shell is painful. So every bit of control metadata useful to any particular set of rebase flags was usually a separate control file. And, of course, when switching from shell to C, we have this nice existing regression testsuite, and the best way to pass it is to maintain backward compatibility. So we're writing a lot of control files with every commit being picked too.

But the insanity doesn't end there. Given that there were piles and piles git commands that had to be run for each commit being picked back when git-rebase was a shell script, on top of a slow merging backend in various cases, it made sense to add special case code to avoid work when possible. So, for example, rebase has some code to pre-emptively check if any of the commits being rebased happen to be a clean cherry-pick of something in the upstream branch newer than the commits being picked. That means pairwise comparing an awful lot of trees if history has diverged significantly. This probably saved time in the past, but has been measured and shown to be a significant performance overhead nowadays. Sadly, we can't just drop it, because it logically changes the results you get (trying to reapply a cherry pick that might be upstream already might just cleanly result in noticing the patch makes no changes and dropping it, but it might not cleanly reapply and thus throw conflicts at the user). Thus, we had to add a --reapply-cherry-picks option to rebase and mark it as important for getting good performance. Sadly, it was original identified as important for performancewithin partial clones (seegit/git@0fcb4f6), but subsequent measurements show it is just causing problems for rebases in general (see e.g.https://lore.kernel.org/git/CABPp-BE48=97k_3tnNqXPjSEfA163F8hoE+HY0Zvz1SWB2B8EA@mail.gmail.com/)

There is also another revision walk done, I believe as an optimization ported from shell, in order to check if we can just fast-forward. But my measurements showed that while I'm sure it was a big perf win for the original shell script, it actually hurts performance in the C implementation when rebasing across a large portion of history. Even if we kept it, I think I noted at some point that it had multiple calls to merge-base computations, which seems excessive to me though I didn't look closely.

The transliteration from shell still shows through in a few places too. For example, for each new commit it wants to create, why use some internal API when you can fork a full "git commit" process? One of the benefits of forking another process to create the commit, is that you then need to afterwards discard and re-read the index to make sure it's still consistent with what's on disk. "git checkout" for switching to the appropriate branch to start the rebasing process is also a forked subprocess for no real reason. There are also little things like runninggit notes copy --for-rewrite=rebase though that's only once at the end of the rebase and might be a nice touch generally though viewed from a performance perspective it's another (hopefully minor) loss.

So, yeah, git rebase still haslots of room for improvement. The actual merges to generate the new trees are not even close to being the primary cost. It's on my radar (and that of a few other folks), but no promises about if or when these things will be cleaned up.

Comment options

martinvonz Jan 8, 2022
Maintainer Author

Whoa, thanks for the extremely detailed reply!

The rebase code in Git, found in sequencer.c + builtin/rebase.c, still has various vestiges of its transliteration from shell script. One of those is that the code assumes at every step that one has a full working tree and index.

Ah, of course, I should have thought of that. The rebase code is one of few areas of Git's codebase I've worked on. I've also spent months cleaning up Mercurial's rebase code. That has an "in-memory" (i.e. not-touching-working-copy) mode since a few years (not my work, though). However, it still falls back to using the working copy if there are conflicts (the whole rebase operation restarts when that happens). Much of the problems you describe sound familiar from Mercurial's transition.

I had worried (for Git's sake) that preserving the behavior of hooks (such as the post-checkout hook) would be an issue, but I was told that the current behavior is not documented or guaranteed (IIUC), so that's nice.

Interesting about the--reapply-cherry-picks flag. I think if I want to add support for that injj (the--no- version of it), I'll probably simply check the Change ID since we have such a thing.

The transliteration from shell still shows through in a few places too. For example, for each new commit it wants to create, why use some internal API when you can fork a full "git commit" process?

Heh. I've heard about Git's frequent subprocessing in our weekly "Git chalk talks" at work.

"git checkout" for switching to the appropriate branch to start the rebasing process is also a forked subprocess for no real reason.

And I assume you eventually shouldn't have to switch branch until possibly right at the end once everything is done "repo-first".

So, yeah, git rebase still haslots of room for improvement. The actual merges to generate the new trees are not even close to being the primary cost. It's on my radar (and that of a few other folks), but no promises about if or when these things will be cleaned up.

I completely understand. Thanks for your work so far!

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rebase is getting fast!#49

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

martinvonz
Dec 19, 2021
Maintainer

Replies: 2 comments 3 replies

Uh oh!

{{title}}

Uh oh!

martinvonz
Dec 19, 2021
Maintainer Author

Uh oh!

{{title}}