Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Git.execute's kill_after_timeout callback assumes procps #1756

Open
@EliahKagan

Description

@EliahKagan

Background

CallingGit.execute—whether directly, or indirectly by calling the dynamic attributes of aGit instance—and passingkill_after_timeout with a non-None value, creates a timer on a separate thread that calls the localkill_process function. This callback function usesos.kill to kill the process. Before killing the process, it enumerates the process'sdirect children. If sending the first signal succeeds (basically, if the parent process still existed), it also attempts to kill the child processes.

The children are enumerated withps --ppid:

GitPython/git/cmd.py

Lines 1010 to 1014 infe082ad

p=Popen(
["ps","--ppid",str(pid)],
stdout=PIPE,
creationflags=PROC_CREATIONFLAGS,
)

The problem

The--ppid option isnot POSIX. Most GNU/Linux systems haveprocps, whoseps implementation supports--ppid. I am unsure ifany other implementations ofps support it. The procps tools generally run only on Linux-based systems, because they use the/proc filesystem (and assume it is laid out as in Linux). Although they can run on any such system, Alpine Linux and some minimal GNU/Linux environments do not ship them, defaulting tops frombusybox instead.

As demonstrated below,macOSps does not support--ppid. Nor doFreeBSD,NetBSD,OpenBSD, orDragonFly.AIX does not have--ppid.illumos does not have--ppid; nor doesSolaris, though-ppid (with one-) can be used in 11.4.27 or higher. Although Cygwin mimics Linux where feasible, its/proc filesystem is different, and itsps does not support--ppid either (nor even some important POSIX options like-o).

The callback parses stdout from thatps command, but does not examine the exit status or stderr. The effect is that an error message on a system without procps (or anotherps supporting--ppid, if there is one) is printed, and the parent process is still sentSIGKILL, but its children are never found or sent signals.

As detailed below, although thefetch,pull, andpush methods of theRemote class accept akill_after_timeout argument, they do not useGit.execute, so they are unaffected by this bug.

Steps to reproduce

On macOS 13 (on a GitHub Actions CI runner withtmate), I created this script in a directory in$PATH, named itgit-sleep, and marked it executable:

#!/bin/shsleep"$@"

Then I calledsleep on aGit instance with akill_after_timeout argument specifying a shorter duration than the sleep:

bash-3.2$ pythonPython 3.12.0 (v3.12.0:0fb18b02c8, Oct  2 2023, 09:45:56) [Clang 13.0.0 (clang-1300.0.29.30)] on darwinType "help", "copyright", "credits" or "license" for more information.>>> from git import Git>>> Git().sleep(10, kill_after_timeout=5)ps: illegal option -- -usage: ps [-AaCcEefhjlMmrSTvwXx] [-O fmt | -o fmt] [-G gid[,gid...]]          [-g grp[,grp...]] [-u [uid,uid...]]          [-p pid[,pid...]] [-t tty[,tty...]] [-U user[,user...]]       ps [-L]Traceback (most recent call last):  File "<stdin>", line 1, in <module>  File "/Users/runner/work/GitPython/GitPython/git/cmd.py", line 741, in <lambda>    return lambda *args, **kwargs: self._call_process(name, *args, **kwargs)                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  File "/Users/runner/work/GitPython/GitPython/git/cmd.py", line 1320, in _call_process    return self.execute(call, **exec_kwargs)           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^  File "/Users/runner/work/GitPython/GitPython/git/cmd.py", line 1117, in execute    raise GitCommandError(redacted_command, status, stderr_value, stdout_value)git.exc.GitCommandError: Cmd('git') failed due to: exit code(-9)  cmdline: git sleep 10  stderr: 'Timeout: the command "git sleep 10" did not complete in 5 secs.'>>>

Impact

1. Theotherkill_after_timeout is unaffected

There are two callables defined ingit/cmd.py that accept an optionalkill_after_timeout argument: the "internal" top-levelhandle_process_output function that is not listed in__all__ but is used throughout GitPython, and the publicGit.execute method (also used when dynamicGit methods are called). The meaning of this argument is subtly different, and the associated implementations completely different.

This bug affects only the one in theGit class. Thus it does not affect common uses of timeouts in interacting with remotes: theRemote.fetch,Remote.push, andRemote.pull methods acceptkill_after_timeout arguments, but they forward them tohandle_process_output.

2. But this oneshould work on all Unix-like systems

From context, I think it is unintended not to support common Unix-like systems such as macOS. TheGit.execute docstring says "This feature is not supported on Windows" and makes no other claims about compatibility, from which I think readers will reasonably infer that other platforms are believed supported. When called on a native Windows system (not Cygwin) with a non-None value forkill_after_timeout, it raises aGitCommandError. Other systems, including Cygwin, raise no exception and register thekill_process callback.kill_after_timeout is thusin effect documented to work on all systems except native Windows.

3. What happens if the child processes aren't sentSIGKILL?

I don't know how much of a problem it is forSIGKILL to be sent only to the parent and not to its direct children. I am not confident I know why that is being done, as opposed to killing only the parent process, or attempting to kill its entire process tree. Myguess is that this is because manygit commands use a subprocess to do their work. If so, then it may in practice be important—in situations where people passkill_after_timeout—that the child processes are killed as well.

However,git subprocesses do sometimes use their own subprocesses:

ek@Glub:~$ pstree -ainit(Ubuntu)  ├─SessionLeader  │   └─Relay(9)  │       ├─bash  │       │   └─git clone https://github.com/huggingface/transformers.git  │       │       └─git remote-https origin https://github.com/huggingface/transformers.git  │       │           └─git-remote-http origin https://github.com/huggingface/transformers.git...

In that example, thegit-remote-http process may not receiveSIGKILL. I am unsure how much this matters, but if it is a problem, then the more severe it is, theless severe this bug is, because the intended behavior wouldn't help anyway. Likewise, in situations where killing the parent process is sufficient, this bug also does not cause a problem.

That lower descendants are not killed has been reported as#895. That was observed in GitPython 2.0.2, which had the current approach ofkilling just the direct child processes.

A minor race condition…

One thing I'm a little worried about is a race condition that is currently present, and that I think may not be possible to fix,but that I worry finding child processes in a more portable way may exacerbate. Unless it can be solved or mitigated more deeply, it is a reason, unrelated to performance, to prefer that a portable substitute for the existing use ofps --ppidnot be too much slower than the current way. (I likewise worry that if the approach were changed to kill all descendants, then the added time to traverse the whole subtree might exacerbate this race condition.)

Suppose we plan to kill a process P and all its direct child processes including Q, and we find the PID of Q, but before killing Q, all the following happen:

  1. Q dies.
  2. Q is reaped. That is, it iswait(2)ed by its parent--which is either its original parent P or, if P has died, theninit--causing its entry in the process table to be removed and its PID to be available for use by a future process.
  3. A new process, R, is created and assigned Q's old PID.

Then when when we try to kill Q, we kill R.

This situation is rare, because in practice the time between when a process is reaped and when a new process is given its PID is only shortwhen the process table is nearly full so the kernel has no less recently relinquished PIDs to give out. But I think it would be best to avoid increasing the risk of it.

There may be other related race conditions, but this is the one that seems it could be worsened by replacing the existing unportable use ofps --ppid with some other technique, if that other technique is markedly slower.

Finding/killing the the subprocesses portably

I am unsure if this should be done, because it is not clear to me that killing the parent process and its direct child processes, as is currently attempted (generally successfully on GNU/Linux and unsuccessfully elsewhere), is necessarily what should happen. Doing anything else might risk incompatibility for some existing use cases on some systems, so I would want to be cautious about doing something altogether different, but I think it should still be considered before proceeding.

However, assuming the current approach of killing the child processes should be preserved, I think there are three cases:

  1. Systems withpgrep/pkill.
  2. Systems whoseps is POSIX-compliant, or at least supports-A and-o.
  3. Cygwin. (And the like, e.g., MSYS2. Butsys.platform == "cygwin" still covers that.)

Case 1 could be folded into case 2 if a speed regression is acceptable (but see above on the race condition), or if testing reveals usingpgrep orpkill is not significantly faster. Case 3 could be dropped in lieu of modifying the docstring to document thatkill_after_timeout is less effective on Cygwin, if reducing complexity is regarded as more important than covering it.

Whether to cover case 3 or not is more a matter of code complexity than time to write and review the code. With or without it, I think most of the time and effort would be on thetests. Currently none cover passingkill_after_timeout toGit.execute or to a dynamic method of aGit object. Only theotherkill_after_timeout—ofhandle_process_output—has test coverage. Because this project has CI on Cygwin, I don't think the tests have to do much to accommodate it—its challenges are ready-made.

(An alternative to dealing with these details is to usepsutil, but I'm unsure if the impact of this issue is sufficient to justify adding it as a dependency. It doesn't support all systems, but systems it doesn't support are rare. I think it could be made conditional on the systems it is installable on, and the features that use it be documented as unavailable on other systems. I think this is probably not worth doing just for this, but if it turns out it would help in various other places, and increasing rather than decreasing OS compatibility—as it would here—then it might make sense to consider it. On the other hand, one benefit of GitPython is that it has very few dependencies.)

1. If we havepgrep/pkill

pgrep andpkill are not POSIX, but they are available on many more systems thanps --ppid. Furthermore, it is likely that all systems that supportps --ppid also havepgrep andpkill, because not only are they very common, but procps (which provides the onlyps with--ppid I can find, as discussed above)includes an implementation of them. Of course, it's possible (odd, but possible) for a distribution to use procps forps but not includepgrep andpkill. Whetherpkill can be used to consolidate the steps, orpgrep must be used together with something what is already there, is a design decision that should be influenced by a decision about the best order for sendingSIGKILL.

If it's acceptable to sendSIGKILL to the child processes first, then instead of running["ps", "--ppid", str(pid)]and most of what comes after it, one option is to run["pkill", "-P", str(pid)] and then:

  • If it succeeds, immediately callos.kill(pid, signal.SIGKILL) andkill_check.set().
  • If it fails due topkill not existing—or if it was checked first and found absent—proceed to case 2 (usingps).

If it's not acceptable to sendSIGKILL to the child processes first, or if either order is acceptable but it is desirable to share more code with the fallback case 2, then instead of running["ps", "--ppid", str(pid)], run["pgrep", "-P", str(pid)], then:

  • If it failed due topgrep not existing—or if it was checked first and found absent—proceed to case 2.
  • Treat an exit status of 0 or 1 as success; 1 is when there were no children. (This seems to hold across different implementations, but I'll want to look into it further, since these tools are not POSIX orXSI.)
  • Parse the output—each line is just a PID, with no headers, no other columns—intochild_pids.
  • Continue with the rest of thekill_process function as it already exists.

2. Ifps supports-A and-o

This is almost every Unix-like system used today;POSIX requires these options.

Instead of running["ps", "--ppid", str(pid)], run["ps", "-A", "-o", "pid,ppid"], then:

  • Check that the first row isPID andPPID to safeguard against unexpectedly nonstandardps.
  • Each remaining row should have child and parent process IDs. Filter for the rows where the parent (second column) is what we passed, and populatechild_pids with the PIDs from the first column.
  • Continue with the rest of thekill_process function as it already exists.

If using this is as fast apkill/pgrep, or slower but not by a lot, or code simplicity is considered more important than the small worsening of the rare race condition, then this could be used on all systems except Cygwin. The truth is that it is only out of fear of worsening things in weird situations on GNU/Linux systems with procps that I have even proposed case 1. This is the portable way to do it (except Cygwin).

It may be possible to optimize this with-U to filter the real user ID toos.getuid(), or-u to filter the effective user ID toos.geteuid(), though-u seems to be an XSI extension. I don't know if this would actually make things faster. I don't know if the added complexity, though modest, would be worthwhile even if it does. When doing this,-A would not also be passed.

The reason not to simply omit-A without replacing it, which gets processes that share the caller's EUID, is that it also only shows processes with the same controlling terminal. The reason not to use-a instead is that it doesn't show processes not associated with any terminal. The reason I prefer-A to its synonym-e is that-e seems to be an XSI extension.

3. Cygwin

Runningps on Cygwin gives output that looks like:

      PID    PPID    PGID     WINPID   TTY         UID    STIME COMMAND     1641       1    1641      27868  ?         197609   Nov 14 /usr/bin/ssh-agent     2201       1    2201      27112  ?         197609 02:35:26 /usr/bin/mintty     2202    2201    2202      41336  pty0      197609 02:35:26 /usr/bin/bash     2304    2202    2304      47276  pty0      197609 14:47:12 /usr/bin/ps

This can be modified by various options, but-o is not supported. (-A is not supported either, but it is not needed.)

Instead of running["ps", "--ppid", str(pid)], we can run["ps"], then:

  • Check the first row headers, or at least the leadingPID andPPID headers that we are going to use, to safeguard against unexpectedly non-Cygwinps or future changes to Cygwinps.
  • Continue as in case 2 after the check, making sure to use only the first two fields.

Perspective

I think thewhat is more important than thehow, because:

Test coverage

Unlike the otherkill_after_timeout (inhandle_process_output), the code path whereGit.execute is passedkill_after_timeout has no test coverage. It would be good to test it even if this bug is not fixed. But at the root of both is figuring out if killing the parent process and its direct child processes are what is wanted.

Maintainability

Thekill_process callback is never called on (native) Windows, where callingGit.execute with a non-None value forkill_after_timeoutraisesGitCommandError. But it contains what seem to be the remains of an attempt to support Windows: itpassesPROC_CREATIONFLAGS (which is 0except on Windows) when runningps, and itfalls back tosignal.SIGTERM whensignal.SIGKILL is absent (which it is on Windows).

I discovered this whole issue because I want to remove that code, which I think could lead to future bugs, and I was looking into whether there is any reason not to. A possible reason not to is ifkill_process can be easily modified to support Windows—which it could, if it is acceptable to kill either only the parent process, or the whole process tree, though whether itshould is another question. Because figuring out what to do about this issue entails figuring that out too, it would openkill_process up to that improvement—dropping its vestigial Windows code if it is not going to support Windows—and possibly others.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions


      [8]ページ先頭

      ©2009-2025 Movatter.jp