Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Parallel checkout#1749

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
cedric-appdirect wants to merge3 commits intogo-git:main
base:main
Choose a base branch
Loading
fromcedric-appdirect:parallel-checkout

Conversation

@cedric-appdirect
Copy link

This PR introduce the use of parallel goroutine to checkout each files. This scale as you would expect by the number of core. It has minimal overhead for very small repository as most of the workload is driven by decompression and syscall which dwarf the use of the channel to send "checkout command".

Add BenchmarkCloneLargeRepo and BenchmarkCloneDeepRepo to measure cloneperformance when dealing with repositories containing thousands of files.These benchmarks exercise the resetWorktree codepath, which is currentlyprocessing files sequentially. This establishes a baseline for measuringthe performance improvement from parallelizing file checkout operations.With ~35,000 files in production repositories, the sequential checkoutbecomes a significant bottleneck. These benchmarks will help quantifythe improvement from parallel checkout implementation.
Make the Tree struct safe for concurrent access by multiple goroutines,addressing race conditions in lazy initialization of internal maps whilepreserving memory efficiency.Changes:1. Added sync.Once to guarantee single initialization of t.m map2. Changed t.t from map[string]*Tree to sync.Map for thread-safe cache3. Removed mutex from worktree.go parallel checkout (now unnecessary)Thread-safety guarantees:- entry() uses sync.Once for one-time t.m initialization- FindEntry() uses sync.Map for concurrent path cache access- Multiple goroutines can safely call FindEntry() simultaneouslyPerformance characteristics:- First FindEntry() call: builds map once (one-time cost)- Subsequent calls: lock-free reads from t.m (fast path)- sync.Map optimized for read-heavy workloads (typical usage)- Memory overhead: slight increase by using sync.Map instead of Map
Improve checkout performance for large repositories by parallelizing filecheckout operations. Each worker goroutine gets its own independent objectstorage instance, eliminating contention on packfile readers and enablingtrue concurrent processing.Implementation:- Added worker pool sized to runtime.GOMAXPROCS(0)- Created thread-safe createWorkerTree() called from within each goroutine- Each worker has independent packfile file descriptors and object cache- Made indexBuilder thread-safe with sync.Mutex for concurrent updatesKey design decisions:1. Per-worker isolation: Each goroutine creates its own filesystem.Storage   instance by calling createWorkerTree() directly. This provides:   - Separate file descriptors for packfile access   - Independent LRU object caches   - No shared mutable state between workers   - Eliminates packfile reader contention   - Cleaner encapsulation with storage created within goroutine scope2. Thread-safe createWorkerTree(): Designed to be safely called concurrently   from multiple goroutines. NewStorage() creates isolated instances, and   the shared billy.Filesystem is only used for read operations.3. Thread-safe indexBuilder: Protected Add() and Remove() methods with   mutex since multiple workers update the index concurrently.4. Graceful fallback: For non-filesystem storage (e.g., memory-backed),   falls back to shared tree without breaking functionality.The speedup comes from:- Parallel zlib decompression across CPU cores- Concurrent disk I/O for packfile reads and file writes- Better CPU utilization during checkout operationsThis change works in conjunction with the thread-safe Tree implementation(sync.Once + sync.Map) to enable safe concurrent access to git objects.Scalability:Performance improvement scales with CPU count. Repositories with morefiles and/or bigger files will see greater benefits. The approach handlesboth small repos (minimal overhead) and large repos (significant speedup)effectively.
@cedric-appdirect
Copy link
Author

I have left this PR in draft as I am not sure of the code logic here. One of the area I am not a fan, but couldn't figure out a better approach is the creation of workerTree for each of the goroutine. Will need review and advise from someone that understand and know this code base.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

1 participant

@cedric-appdirect

[8]ページ先頭

©2009-2025 Movatter.jp