Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork937
Description
Runningis_dirty()
on my repo takes 5 minutes because it's a large repo, has text conversion enabled for diffs, andis_dirty()
is outputting a full diff.is_dirty()
should be a relatively simple operation, but since it usesgit diff
instead of a plumbing command likegit diff-files
it incurs the cost of displaying nice output for users.
Thediff.astextplain.textconv
git option converts pdf files to text before diffing. This optionappears to come with msys git. It's useful when diffing interactively, but a lot of overhead when just checking for dirty state.
GitPython doesn't look at the output of the diff, it just checks that it's not empty:
Lines 957 to 977 in3470fb3
# Start from the one which is fastest to evaluate. | |
default_args= ["--abbrev=40","--full-index","--raw"] | |
ifnotsubmodules: | |
default_args.append("--ignore-submodules") | |
ifpath: | |
default_args.extend(["--",str(path)]) | |
ifindex: | |
# diff index against HEAD. | |
ifosp.isfile(self.index.path)andlen(self.git.diff("--cached",*default_args)): | |
returnTrue | |
# END index handling | |
ifworking_tree: | |
# diff index against working tree. | |
iflen(self.git.diff(*default_args)): | |
returnTrue | |
# END working tree handling | |
ifuntracked_files: | |
iflen(self._get_untracked_files(path,ignore_submodules=notsubmodules)): | |
returnTrue | |
# END untracked files | |
returnFalse |
If we switch from thediff
todiff-index
, we can see that it's comparable in speed to turning off the text conversions:
$ time (git diff --abbrev=40 --full-index --raw | cat)...snip...:100644 100644 66130ffa9aa8b4bc98a1918a946919f94c9a819d 0000000000000000000000000000000000000000 M src/thread.cppreal 5m14.842suser 1m6.922ssys 4m20.000s$ time (git diff-index --quiet HEAD -- && echo "clean" || echo "dirty")real 0m4.517suser 0m0.203ssys 0m32.281s$ echo "*.pdf-diff=astextplain" > .gitattributes$ time (git diff --abbrev=40 --full-index --raw | cat)...snip...:100644 100644 66130ffa9aa8b4bc98a1918a946919f94c9a819d 0000000000000000000000000000000000000000 M src/thread.cppreal 0m4.394suser 0m0.109ssys 0m32.250s
(These timings are all after running these command several times. When I first ran git diff it took 26 minutes!)
Workaround
Add a .gitattributes that disables the text conversion:
*.pdf-diff=astextplain
Solution
I think is_dirty should instead usediff-files
anddiff-index
.This answer looks like a good explanation of how they work.
Here's my rough replacement for is_dirty:
importgitdefis_dirty(repo,index:bool=True,working_tree:bool=True,untracked_files:bool=False,submodules:bool=True,path=None,):default_args= []ifsubmodules:default_args.append("--ignore-submodules")ifindex:try:# Always want to end with -- (even without path).args=default_args+ ["--quiet","--cached","HEAD","--",path]repo.git.diff_index(*args)exceptgit.exc.GitCommandError:returnTrueifworking_tree:try:args=default_args+ ["--quiet","--",path]repo.git.diff_files(*args)exceptgit.exc.GitCommandError:returnTrueifuntracked_files:iflen(repo._get_untracked_files(path,ignore_submodules=notsubmodules)):returnTruereturnFalserepo=git.Repo.init("c:/code/project")print("is_dirty",is_dirty(repo))
I'll try to find time to make a proper patch if that sounds good.