- Notifications
You must be signed in to change notification settings - Fork868
Parallel checkout#1749
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Draft
cedric-appdirect wants to merge3 commits intogo-git:mainChoose a base branch fromcedric-appdirect:parallel-checkout
base:main
Could not load branches
Branch not found:{{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline, and old review comments may become outdated.
Draft
Parallel checkout#1749
+326 −14
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
Add BenchmarkCloneLargeRepo and BenchmarkCloneDeepRepo to measure cloneperformance when dealing with repositories containing thousands of files.These benchmarks exercise the resetWorktree codepath, which is currentlyprocessing files sequentially. This establishes a baseline for measuringthe performance improvement from parallelizing file checkout operations.With ~35,000 files in production repositories, the sequential checkoutbecomes a significant bottleneck. These benchmarks will help quantifythe improvement from parallel checkout implementation.
Make the Tree struct safe for concurrent access by multiple goroutines,addressing race conditions in lazy initialization of internal maps whilepreserving memory efficiency.Changes:1. Added sync.Once to guarantee single initialization of t.m map2. Changed t.t from map[string]*Tree to sync.Map for thread-safe cache3. Removed mutex from worktree.go parallel checkout (now unnecessary)Thread-safety guarantees:- entry() uses sync.Once for one-time t.m initialization- FindEntry() uses sync.Map for concurrent path cache access- Multiple goroutines can safely call FindEntry() simultaneouslyPerformance characteristics:- First FindEntry() call: builds map once (one-time cost)- Subsequent calls: lock-free reads from t.m (fast path)- sync.Map optimized for read-heavy workloads (typical usage)- Memory overhead: slight increase by using sync.Map instead of Map
Improve checkout performance for large repositories by parallelizing filecheckout operations. Each worker goroutine gets its own independent objectstorage instance, eliminating contention on packfile readers and enablingtrue concurrent processing.Implementation:- Added worker pool sized to runtime.GOMAXPROCS(0)- Created thread-safe createWorkerTree() called from within each goroutine- Each worker has independent packfile file descriptors and object cache- Made indexBuilder thread-safe with sync.Mutex for concurrent updatesKey design decisions:1. Per-worker isolation: Each goroutine creates its own filesystem.Storage instance by calling createWorkerTree() directly. This provides: - Separate file descriptors for packfile access - Independent LRU object caches - No shared mutable state between workers - Eliminates packfile reader contention - Cleaner encapsulation with storage created within goroutine scope2. Thread-safe createWorkerTree(): Designed to be safely called concurrently from multiple goroutines. NewStorage() creates isolated instances, and the shared billy.Filesystem is only used for read operations.3. Thread-safe indexBuilder: Protected Add() and Remove() methods with mutex since multiple workers update the index concurrently.4. Graceful fallback: For non-filesystem storage (e.g., memory-backed), falls back to shared tree without breaking functionality.The speedup comes from:- Parallel zlib decompression across CPU cores- Concurrent disk I/O for packfile reads and file writes- Better CPU utilization during checkout operationsThis change works in conjunction with the thread-safe Tree implementation(sync.Once + sync.Map) to enable safe concurrent access to git objects.Scalability:Performance improvement scales with CPU count. Repositories with morefiles and/or bigger files will see greater benefits. The approach handlesboth small repos (minimal overhead) and large repos (significant speedup)effectively.
Author
cedric-appdirect commentedNov 25, 2025
I have left this PR in draft as I am not sure of the code logic here. One of the area I am not a fan, but couldn't figure out a better approach is the creation of workerTree for each of the goroutine. Will need review and advise from someone that understand and know this code base. |
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR introduce the use of parallel goroutine to checkout each files. This scale as you would expect by the number of core. It has minimal overhead for very small repository as most of the workload is driven by decompression and syscall which dwarf the use of the channel to send "checkout command".