Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork33.7k
Description
In#104512 we madepathlib.Path.glob() use a "walk-and-filter" strategy for expanding** wildcards in patterns: when we encounter a** segment, we immediately consume subsequent segments and use them to build a regex that is used to filter results. This saves a bunch ofscandir() calls.
However! We actually build a regex for theentire pattern given toglob(), rather than just the segments following** wildcards. And so when evaluating a pattern likedir*/**/file*, thedir* part is needlessly matched twice against each path.@zooba noted this in areview comment at the time.
We should be able to improve performance by building anre.Pattern only for segments following** wildcards, and not the entireglob() pattern.
Linked PRs
- GH-115060: Speed up
pathlib.Path.glob()by removing redundant regex matching #115061 - GH-115060: Speed up
pathlib.Path.glob()by skipping directory scanning #116152 - GH-115060: Speed up
pathlib.Path.glob()by not scanning literal parts #117732 - GH-115060: Speed up
pathlib.Path.glob()by omitting initialstat()#117831