Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

is_dirty() is very slow when using diff.astextplain.textconv #1962

Open
@idbrii

Description

@idbrii

Runningis_dirty() on my repo takes 5 minutes because it's a large repo, has text conversion enabled for diffs, andis_dirty() is outputting a full diff.is_dirty() should be a relatively simple operation, but since it usesgit diff instead of a plumbing command likegit diff-files it incurs the cost of displaying nice output for users.

Thediff.astextplain.textconv git option converts pdf files to text before diffing. This optionappears to come with msys git. It's useful when diffing interactively, but a lot of overhead when just checking for dirty state.

GitPython doesn't look at the output of the diff, it just checks that it's not empty:

# Start from the one which is fastest to evaluate.
default_args= ["--abbrev=40","--full-index","--raw"]
ifnotsubmodules:
default_args.append("--ignore-submodules")
ifpath:
default_args.extend(["--",str(path)])
ifindex:
# diff index against HEAD.
ifosp.isfile(self.index.path)andlen(self.git.diff("--cached",*default_args)):
returnTrue
# END index handling
ifworking_tree:
# diff index against working tree.
iflen(self.git.diff(*default_args)):
returnTrue
# END working tree handling
ifuntracked_files:
iflen(self._get_untracked_files(path,ignore_submodules=notsubmodules)):
returnTrue
# END untracked files
returnFalse

If we switch from thediff todiff-index, we can see that it's comparable in speed to turning off the text conversions:

$ time (git diff --abbrev=40 --full-index --raw | cat)...snip...:100644 100644 66130ffa9aa8b4bc98a1918a946919f94c9a819d 0000000000000000000000000000000000000000 M      src/thread.cppreal    5m14.842suser    1m6.922ssys     4m20.000s$ time (git diff-index --quiet HEAD -- && echo "clean" || echo "dirty")real    0m4.517suser    0m0.203ssys     0m32.281s$ echo "*.pdf-diff=astextplain" > .gitattributes$ time (git diff --abbrev=40 --full-index --raw | cat)...snip...:100644 100644 66130ffa9aa8b4bc98a1918a946919f94c9a819d 0000000000000000000000000000000000000000 M      src/thread.cppreal    0m4.394suser    0m0.109ssys     0m32.250s

(These timings are all after running these command several times. When I first ran git diff it took 26 minutes!)

Workaround

Add a .gitattributes that disables the text conversion:

*.pdf-diff=astextplain

Solution

I think is_dirty should instead usediff-files anddiff-index.This answer looks like a good explanation of how they work.

Here's my rough replacement for is_dirty:

importgitdefis_dirty(repo,index:bool=True,working_tree:bool=True,untracked_files:bool=False,submodules:bool=True,path=None,):default_args= []ifsubmodules:default_args.append("--ignore-submodules")ifindex:try:# Always want to end with -- (even without path).args=default_args+ ["--quiet","--cached","HEAD","--",path]repo.git.diff_index(*args)exceptgit.exc.GitCommandError:returnTrueifworking_tree:try:args=default_args+ ["--quiet","--",path]repo.git.diff_files(*args)exceptgit.exc.GitCommandError:returnTrueifuntracked_files:iflen(repo._get_untracked_files(path,ignore_submodules=notsubmodules)):returnTruereturnFalserepo=git.Repo.init("c:/code/project")print("is_dirty",is_dirty(repo))

I'll try to find time to make a proper patch if that sounds good.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions


      [8]ページ先頭

      ©2009-2025 Movatter.jp