renamestimeout tokill_after_timeout and set default toNone instead of10 s
Surfaceskill_after_timeout to user facing functions (remote. pull/push/fetch)
refactorAutoInterrupt so we can._terminate() the underlying process and grab status codes from terminated processes
Add a class variableAutoInterrupt._status_code_if_terminate; if non-zero it overrides any status code from the process in_terminate. This can be used to prevent race conditions in tests,as described in this comment

Some history of the commits (while trying a different strategy):

aafb300 changes the default timeout to 60 s
1d26515 allowspush,pull andfetch keywordtimeout that will be propagated down
febd4fe alters some tests to use the old 10 s timeout forfetch andpull
4113d01 also alters a test call topull to add a timeout of 10s
c55a8e3 fixes the test suite
7df33f3 sets the default timeout toNone

if the current default is too lose, users can now give their own timeout

sroet added4 commits

September 10, 2021 13:36

change default fetch timeout to 60 s

aafb300

allow for timeout propagation

1d26515

add test timeout with the old 10 s timeout

febd4fe

also test a call to 'push' with 10s timeout

4113d01

sroet changed the title~~Update fetch timeout~~[WIP] Update fetch timeout

Sep 10, 2021

propagate kwargs in do_test_fetch

c55a8e3

Copy link

ContributorAuthor

sroet commentedSep 10, 2021

alright, the test suite runs locally and gives the following coverage:

---------- coverage: platform linux, python 3.7.11-final-0 -----------Name                                Stmts   Miss  Cover-------------------------------------------------------git/__init__.py                        42      8    81%git/cmd.py                            525     98    81%git/compat.py                          46     15    67%git/config.py                         439     48    89%git/db.py                              27      1    96%git/diff.py                           268     10    96%git/exc.py                             63      3    95%git/index/__init__.py                   2      0   100%git/index/base.py                     498     32    94%git/index/fun.py                      179      7    96%git/index/typ.py                       68      3    96%git/index/util.py                      50      5    90%git/objects/__init__.py                13      0   100%git/objects/base.py                    91     10    89%git/objects/blob.py                    17      0   100%git/objects/commit.py                 276     28    90%git/objects/fun.py                    101      5    95%git/objects/submodule/__init__.py       0      0   100%git/objects/submodule/base.py         583     46    92%git/objects/submodule/root.py         137     12    91%git/objects/submodule/util.py          51      9    82%git/objects/tag.py                     47     12    74%git/objects/tree.py                   181     73    60%git/objects/util.py                   232     32    86%git/refs/__init__.py                    6      0   100%git/refs/head.py                       98      6    94%git/refs/log.py                       150     10    93%git/refs/reference.py                  55      3    95%git/refs/remote.py                     32      2    94%git/refs/symbolic.py                  338     18    95%git/refs/tag.py                        47      6    87%git/remote.py                         427     55    87%git/repo/__init__.py                    1      0   100%git/repo/base.py                      552     36    93%git/repo/fun.py                       201     20    90%git/types.py                           44     10    77%git/util.py                           539    121    78%-------------------------------------------------------TOTAL                                6426    744    88%Results (64.31s):     378 passed      42 skipped

sroet changed the title~~[WIP] Update fetch timeout~~Update fetch timeout

Sep 10, 2021

sroet changed the title~~Update fetch timeout~~Surface fetch/pull/push timeout and set default to 60 s

Sep 10, 2021

sroet mentioned this pull request

Sep 10, 2021

Temporary pin for GitPythonopenpathsampling/openpathsampling#1069

Closed

dwhswenson mentioned this pull request

Sep 10, 2021

[TODO] Unpin GitPythonopenpathsampling/openpathsampling#1070

Closed

Byron mentioned this pull request

Sep 10, 2021

default thread timeout too short for CI#1339

Closed

Byron requested a review fromYobmod

September 10, 2021 14:32

Byron added this to thev3.1.24 - Bugfixes milestone

Sep 10, 2021

Copy link

Member

Byron commentedSep 10, 2021

Thanks a lot for the initiative. With the offending release yanked I believe we win some time to wait for@Yobmod to chime in on this.

My preference, if I understand what's happening correctly, is to set the default timeout toNone and not alter the previous behaviour, as the code as is can't tell if it's just a long-running process or a stalled one. Thus I would prefer offering this up to callers only.

Copy link

Contributor

Yobmod commentedSep 10, 2021•
edited
Loading

Hi, I think setting the timeout default to None works better, rather than a larger number.
Thread.join(None) would then wait as long as needed (I think?), but if people want a timeout, it's still there to use.

Or it could be removed altogether.
It got added when the tests started hanging in a 3.10 beta. I couldn't find the problem, but it doesn't happen with release candidates any more.

Copy link

dwhswenson commentedSep 10, 2021

FWIW, the one concern I have about surfacing the argument here is that I think there might be confusion betweencmd.execute_kwargs['kill_after_timeout'] and thistimeout. As a user, without a careful read of the docs I would expect the behavior ofremote.fetch(timeout=60) to be that of'kill_after_timeout'.

reset default timeout to None

7df33f3

sroet changed the title~~Surface fetch/pull/push timeout and set default to 60 s~~Surface fetch/pull/push timeout and reset default to None

Sep 10, 2021

Copy link

ContributorAuthor

sroet commentedSep 10, 2021

@Yobmod,@dwhswenson ; I have reverted the default to None, please let me know what you think.

@Byron, I will not be able to push to this branch for the next ~60 hour, so feel free to push to it if you want this to be finished up before monday

update docstring

b555764

Copy link

Member

Byron commentedSep 11, 2021

@sroet Thanks for your help with this and the incredibly quick turnaround time, and for everyone to share their thoughts to make this better.

FWIW, the one concern I have about surfacing the argument here is that I think there might be confusion betweencmd.execute_kwargs['kill_after_timeout'] and thistimeout. As a user, without a careful read of the docs I would expect the behavior ofremote.fetch(timeout=60) to be that of'kill_after_timeout'.

To me it seems thatkill_after_timeout could have been used to handle this all along, as killing the process the threads are reading from should break their pipes and thus stop the read-loops, allowing them to stop naturally. Since we have tests for this maybe you could test this hypothesis. Indeed it would be great to avoid two timeout parameters who have the same intention, but only one of them seems to be solving the issue at its root.

The actual process terminationhappens here and it looks like it will succeed on non-windows but might still fail to kill stalled processes on windows. To my mind, that's acceptable and I'd rather wait for users to report issues due to this than to keep thetimeout parameter around just for the case the git process isn't responding to signals.

sroet added3 commits

September 13, 2021 17:05

reuse kill_after_timeout kwarg

6b358f9

update tests and add a comment about different behaviour of 'push' vs…

3a81850

… 'fetch'

go for pytest.raises and test that the functions run

d41d537

Copy link

ContributorAuthor

sroet commentedSep 13, 2021

This took a bit more work than expected, but should now work with the keywordkill_after_timeout for the user facing functions

The actual process termination happens here and it looks like it will succeed on non-windows but might still fail to kill stalled processes on windows. To my mind, that's acceptable and I'd rather wait for users to report issues due to this than to keep the timeout parameter around just for the case the git process isn't responding to signals.

That did not really work out, as most user facing functions (fetch,pull, andpush) setas_process=True so that code path is never hit, insteadthis if statement is hit which returns anAutoInterrupt object. Looking atthat class it actually knows how to kill underlying processes (in the__del__ method) so I decided to factor that code out to a private function_terminate (and fixingwait ifself.proc=None).

To me it seems that kill_after_timeout could have been used to handle this all along, as killing the process the threads are reading from should break their pipes and thus stop the read-loops, allowing them to stop naturally.

The reading threads had no way of handling the closing of the streams, so added a check to the exception.

It should now raises like this:

>>>a=git.Repo('.')>>>a.remotes.origin.fetch(kill_after_timeout=0.01)Traceback (mostrecentcalllast):File"<stdin>",line1,in<module>File"/home/sander/github_files/GitPython/git/remote.py",line868,infetchkill_after_timeout=kill_after_timeout)File"/home/sander/github_files/GitPython/git/remote.py",line732,in_get_fetch_info_from_stderrproc.wait(stderr=stderr_text)File"/home/sander/github_files/GitPython/git/cmd.py",line490,inwaitraiseGitCommandError(remove_password_if_present(self.args),status,errstr)git.exc.GitCommandError:Cmd('git')faileddueto:exitcode(-15)cmdline:gitfetch-voriginstderr:'error: process killed because it timed out. kill_after_timeout=0.01 seconds'

When writing the tests, I saw that the error handling code ofpush does something different than forfetch andpull

fetch/pull always error out if there is any error lines instd_err(it does not care if there is output as that is piped toNone)
push only errors out if there are error lines instd_err andno output tostd_out

@Byron Is this mismatch intentional/wanted?

I will update the initial post to indicate the current status

sroet changed the title~~Surface fetch/pull/push timeout and reset default to None~~Surface fetch/pull/push kill_after_timeout and reset default to None

Sep 13, 2021

Copy link

Member

Byron commentedSep 13, 2021

Thanks a million for all the effort you are putting in!

@Byron Is this mismatch intentional/wanted?

It's based on the observation on how these processes usually work and communicate errors. However, if it feels out-of-place or wrong there is no reason not to improve it while you are here. It wouldn't have been the first time that there are long-standing issues in the code-base that somehow never surfaced. While I encourage you for these fixes I also warn that the test-suite might not catch all the subtle behaviour either, so it's probably better to leave as much unchanged as possible while fixing the issue at hand.

Note that CI is failing because of the linter and white-space related errors. It's strange that it asks me to approve running CI on this PR over and over again and I hope it will just run when the fix is pushed.

Copy link

dwhswenson commentedSep 14, 2021

It's strange that it asks me to approve running CI on this PR over and over again and I hope it will just run when the fix is pushed.

Seehttps://github.blog/2021-04-22-github-actions-update-helping-maintainers-combat-bad-actors/, in particular:

Once a collaborator with write access has reviewed the changes, they can approve the workflows to run for the current commit only. If the pull request is updated to a new commit, a new approval will be required.

IMO, that last is a bad choice by GitHub. I understand the desire to avoid abuse by crypto-miners, but there should be an option for the approval to be for the duration of the PR.

Copy link

Member

Byron commentedSep 14, 2021

Thanks for the heads-up, I wasn't aware of this change.

[…] but there should be an option for the approval to be for the duration of the PR.

Couldn't agree more. Here it seems they are over-steering towards abuse-avoidance and trade convenience of using their platform. Instead, if legitimate PRs turn out to become abusive later, it's something I'd happily report - after it happened.
Maybe additional options will be implemented some day, I do have hope.

sroet added2 commits

September 14, 2021 12:27

make flake8 and mypy happy

5531323

fix typo's

da68bb1

Copy link

ContributorAuthor

sroet commentedSep 14, 2021

@bryon, sorry, I should have checked mypy and flake8 localy. they now return:

(gitpy) ***@***:~/github_files/GitPython$ flake80(gitpy) ***@***:~/github_files/GitPython$ mypy -p gitSuccess: no issues found in 37 source files

sroet added3 commits

September 14, 2021 13:30

make test timeout stricter

9678226

fetch is also to quick on CI, only test pull

a9e43e5

two spaces before comments

5081881

Copy link

ContributorAuthor

sroet commentedSep 14, 2021

Looking at the test failure for9678226, it seems like the CI environment is too quickfetching after doing apull to reliably triggering the case, only doing a pull should still hit most codepaths...

set timeout to a non-zero value

083039a

Copy link

ContributorAuthor

sroet commentedSep 14, 2021

Hmm, it started failing locally with timeout=0 , so reset it to a small value. Hopefully this should be small enough to at least let CI fail out, otherwise I would like a reviewer to see if the test also fails on their local system...

Copy link

ContributorAuthor

sroet commentedSep 14, 2021•
edited
Loading

A bit of info on why the error is not raised but still shows up in the log as:

WARNING  git.remote:remote.py:734 Error lines received while fetching: error: process killed because it timed out. kill_after_timeout=0 seconds

This is because there is a race condition: in the time it takes between the main thread triggering the timeout (which gives the log) and telling the process to terminate (which should trigger the error by setting the status to not 0), the process can complete correctly. In this case, I decided of letting the code recover instead of erroring out.

Copy link

Member

Byron commentedSep 14, 2021

It's interesting as CI still fails even though the timeout is small. Can there be a 'too-small' due to same race elsewhere?

Maybe the CI is onto something here and indicates some fixable issue, and for now I'd be inclined to believe that.

Copy link

ContributorAuthor

sroet commentedSep 14, 2021•
edited
Loading

@Byron the issue lies in the fact that the CI python thread is too slow compared to the git process, I can force CI behavior locally by altering the lines:

fortinthreads:t.join(timeout=kill_after_timeout)ift.is_alive():ifisinstance(process,Git.AutoInterrupt):process._terminate()

fortinthreads:t.join(timeout=kill_after_timeout)ift.is_alive():ifisinstance(process,Git.AutoInterrupt):#TODO debug codeimporttimetime.sleep(1)process._terminate()

What happens is that inside theAutoInterrupt._terminate() function the following lines exist:

# did the process finish already so we have a return code ?try:ifproc.poll()isnotNone:self.status=proc.poll()returnNone

which allows the process status to be0 even if the timeout is triggered initially.

and even later it still does:

            # try to kill it                                                                try:                                                                                proc.terminate()                                                                self.status = proc.wait()    # ensure process goes away

Which also allows the process to terminate gracefully if the kill signal is not fast enough.

Now, I can force setself.status to-15 inside the_terminate function which make sure we always trigger the error whenever_terminate() has been called, even if the process completed between python triggering a timeout and getting around to actually kill the process.

@Byron is that (not allowing the process to recover) the preferred solution here?

[EDIT]: addition; The GitCommandError is only raised (insidewait()) ifAutoInterrupt.status != 0

Copy link

Member

Byron commentedSep 14, 2021

@Byron is that (not allowing the process to recover) the preferred solution here?

A robust implementation is preferred any time, and it seems like a system that can recover to states that are changing is the better option.

Ultimately CI should work and ideally work for the next few years without spurious failures due to races.

Add a way to force status codes inside AutoInterrupt._terminate, and …

e058c4c

…let tests use it

Copy link

ContributorAuthor

sroet commentedSep 15, 2021

Ultimately CI should work and ideally work for the next few years without spurious failures due to races.

Added a class variable toAutoInterrupt that sets the statuscode inside_terminate if it non-zero, and updated the tests to use it. It makes the test pass locally, with thesleep(1). So I am pretty confident it catches the race condition in tests and still allows for the robust implementation, please have another look

Byron self-requested a review

September 15, 2021 23:41

Byron approved these changes

Sep 15, 2021

View reviewed changes

Copy link

Member

Byron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Fantastic, it looks like this PR is ready to be merged. I'd love to have another reviewer though, maybe@Yobmod could take a final look?

I'd be happy to try a new release when merged, too.

Byron mentioned this pull request

Sep 18, 2021

gitpython 3.1.20 not on pypi.org#1345

Closed

Byron merged commitaa50766 intogitpython-developers:main

Sep 18, 2021

Copy link

per1234 commentedSep 20, 2021

Maybe additional options will be implemented some day, I do have hope.

@Byron they already have been:
https://github.blog/changelog/2021-07-01-github-actions-new-settings-for-maintainers/
Switching to the "Require approval for first-time contributors who are new to GitHub" setting has almost completely eliminated the need to approve workflows in the repositories I maintain.

Copy link

Member

Byron commentedSep 20, 2021

Thank you sooo much! That drove me crazy 🤦‍♂️

sroet deleted the update_fetch_timeout branch

September 20, 2021 12:31

Copy link

ContributorAuthor

sroet commentedSep 20, 2021•
edited
Loading

Thank you sooo much! That drove me crazy

Can imagine, thanks for dealing with the CI on this one!

Labels

None yet

5 participants

Movatterモバイル変換

Uh oh!

Surface fetch/pull/push kill_after_timeout and reset default to None#1340

Surface fetch/pull/push kill_after_timeout and reset default to None#1340

Uh oh!

Conversation

sroet commentedSep 10, 2021• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

sroet commentedSep 10, 2021

Uh oh!

Byron commentedSep 10, 2021

Uh oh!

Yobmod commentedSep 10, 2021• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

dwhswenson commentedSep 10, 2021

Uh oh!

sroet commentedSep 10, 2021

Uh oh!

Byron commentedSep 11, 2021

Uh oh!

sroet commentedSep 13, 2021

Uh oh!

Byron commentedSep 13, 2021

Uh oh!

dwhswenson commentedSep 14, 2021

Uh oh!

Byron commentedSep 14, 2021

Uh oh!

sroet commentedSep 14, 2021

Uh oh!

sroet commentedSep 14, 2021

Uh oh!

sroet commentedSep 14, 2021

Uh oh!

sroet commentedSep 14, 2021• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

Byron commentedSep 14, 2021

Uh oh!

sroet commentedSep 14, 2021• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

Byron commentedSep 14, 2021

Uh oh!

sroet commentedSep 15, 2021

Uh oh!

Byron left a comment

Choose a reason for hiding this comment

Uh oh!

per1234 commentedSep 20, 2021

Uh oh!

Byron commentedSep 20, 2021

Uh oh!

sroet commentedSep 20, 2021• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

Uh oh!

sroet commentedSep 10, 2021•
edited
Loading

Yobmod commentedSep 10, 2021•
edited
Loading

sroet commentedSep 14, 2021•
edited
Loading

sroet commentedSep 14, 2021•
edited
Loading

sroet commentedSep 20, 2021•
edited
Loading