- Notifications
You must be signed in to change notification settings - Fork66
Potential Race Condition Fix - OS Rename & Chmod - PermissionError#115
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
DEKHTIARJonathan commentedDec 24, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
After further investigation - my assumption was correct. After the There's multiple ways to fix it.
I think solution 1 is a little cleaner |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thanks for tackling this.
It's strange that GitPython is still using this backend for storing files, I thought it's all done by the Git (CLI) based backend.
The python implementation is pretty slow and I definitely wouldn't want to trust it, so it makes me sad it's still in useand gets to meddle with your work.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
DEKHTIARJonathan commentedDec 24, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
@Byron yeah I'm not so sure TBH. Do you mind publishing a I'm honestly not sure why they use Changes requested applied. Sorry I'm not able to produce a repro case. It's actually very difficult to reproduce (as most race conditions are) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thanks! There are a couple of nits that should probably be fixed before merging.
It comes to mind that ingitoxide
tempfiles are created with the correct mode right away. This is also how Git does it, and a way to completely avoid the issue here. In Python that doesn't seem quite as straightforward, but I wanted to mention it in case you know how.
Uh oh!
There was an error while loading.Please reload this page.
DEKHTIARJonathan commentedDec 26, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Yeap honestly I don't think I know of a way to do this. Question 1: Do you actually need to set the permissions for This case: obj_path=self.db_path(self.object_path(hexsha))obj_dir=dirname(obj_path)os.makedirs(obj_dir,exist_ok=True)# END handle destination directory# rename onto existing doesn't work on NTFSifisfile(obj_path):remove(tmp_path) If not then setting the permission only in the "rename" case and before the actual rename will entirely fix the problem. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Yes, great catch, that's true! It would be better to do that only in the rename case.
Besides that, and beyond this PR, the logic there could use a cleanup, as it really does object writes in the slowest possible way. In order to not slow to a crawl objects need to be written and hashed to memory. Then one won't write the object at all if it already exists in the database.
Uh oh!
There was an error while loading.Please reload this page.
Co-authored-by: Sebastian Thiel <sebastian.thiel@icloud.com>
@Byron Happy new year ;) I'm not 100% sure of your comment. Do you confirm the So far the implementation we have is pretty "risk-free" it's just a retry loop. If I only do it in the rename case, we are now changing the logic and I dont have the knowledge/expertize to know if it's safe. So you tell me |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Independently of what might be right, this change seems to be the smallest possible change to fix this particular issue you have been encountering, so I think it's good to merge.
I will see to create a new release, too.
38b5c38
intogitpython-developers:masterUh oh!
There was an error while loading.Please reload this page.
Amazing ! Thanks a lot@Byron |
EliahKagan left a comment• edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
(This is only posted as a review because that lets me include a comment on a specific part of the changed code. This pull request has already been merged, and rightly so. You are under no obligation to reply!)
After the
rename(tmp_path, obj_path)
operation, the file atobj_path
might not be fully committed to disk immediately. If chmod is called before the system fully registers obj_path as available, a PermissionError may occur.
I'm curious if this is specific to bind mounts to the outside filesystem in Docker Desktop on macOS, or if it is broader. Is this related todocker/for-mac#6812? I ask because:
Not being able to operate on the target of
os.rename
immediately after it returns successfully is unintuitive, and if this is not a bug in Docker, then maybe it should be documented somewhere. If it is in practice specific toos.rename
, then it could be documented for that. If it is already documented, we could add a link tothe comment here.Maybe the delays could be made specific to that situation, so as to avoid waiting before reporting other
PermissionError
s in most circumstances.If creating a file with the desired permissions initially, as suggested in#115 (review), would work, then I think it would be worthwhile. But I worry that some ways of doing this mightalso be limited by whatever limitations apply to the filesystem when using containers on macOS.
If the file cannot be created with the desired permissions initially, I wonder if it might still be possible to create it in a way that would allow the desired permissions to be set without delay. The operation that was not permitted includes accessincreases:
msktemp
creates files with 0600 permissions, but they are then usually changed to 0444. If permissions are changed in a way that onlydecreases access, by usually creating the file with 0644 permissions and then changing them to 0444, would the change still have to wait? It seems odd to think those operations might have different constraints, butdocker/for-mac#6812 makes me think that might be so.Even if we need to avoid path collisions before the file is moved into place as robustly as
mkstemp
does, we can still create the file initially with less restrictive permissions, by usingmkdtemp
orTemporaryDirectory
to make a temporary directory, and then creating the file with a predictable name inside that directory.
My motivation in bringing up (2), (3), and (4) is that I worry that, for some callers, the delays could accumulate. Callers should probably not callstore
in a loop without stopping on errors, so ideally the added delay shouldn't exceed one second. But if a program makes independent attempts to store many objects and then reports errors, then there could be a long delay ifPermissionError
is being raised for other reasons such as the user not having enough access to the repository. In principle this can already happen on Windows because of the retry logic forremove
, but as far as I know it has not been expected on other systems.
# Ensure rename is actually done and file is stable | ||
# Retry up to 14 times - exponential wait & retry in ms. | ||
# The total maximum wait time is 1000ms, which should be vastly enough for the | ||
# OS to return and commit the file to disk. | ||
for exp_backoff_ms in [1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 181]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
These timings seem reasonable, but I believe this is quadratic backoff rather than exponential backoff. (In exponential backoff, delays are a fixed base, often 2, raised to the power of a number that increases by one with each attempt. Here, the number that increases by one with each attempt is the base, raised to a fixed power of 2.) I mention this only in case true exponential backoff is intended or required.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
That is very true! Since I copied these values fromgitoxide
whereI implemented that based on what Git does. So most definitely it was me misnaming this, as Gitcorrectly calls it quadratic.
A PR would definitely be welcome that fixes the name ingitoxide
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I've openedGitoxideLabs/gitoxide#1815 to fix that ingitoxide
. While I was at it, I also opened#119 to make the much more minor correction here.
Byron commentedJan 5, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
This wold indeed be very interesting to find out!@DEKHTIARJonathan would be the one uniquely positioned to do that if time permits. |
The sequence of backoff wait times used in `gitdb.db.loose` isquadratic rather than exponential, as discussed in:gitpython-developers#115 (comment)This corrects the variable name by making it more general, and thecomment by having it explicitly describe the backoff as quadratic.This is conceptually related toGitoxideLabs/gitoxide#1815, butthis is a non-breaking change, as no interfaces are affected: onlya local variable and comment.
Uh oh!
There was an error while loading.Please reload this page.
There is a really flaky and hard to debug bug when using
gitdb
at least on MacOS (M-Processor) using Docker (docker.io/python:3.13
image).I have only been able to reproduce this error within
pytest
on my laptop (Apple M4 laptop).I completely randomly get:
When you inspect a little bit that
obj_path
- everything looks absolutely fine as far as I can tell. I have no idea what could trigger thisPermissionError
. The script is executing under the userdev-user:dev-user
of UID/GID:1000:1000
So clearly something is not going well.
Looking a little bit more in detail, turns out the error is always happening in scenario B - but not systematically in scenario B:
There might be an interference between
tempfile.mkstemp
,os.rename
andos.chmod
. Not sure which one.It seems like a retry loop with a small temporization seems to mitigate the problem.
One supposition is that
os.rename
might not entirely complete by the time it returns. Or some file descriptors might be corrupted. Unfortunately, it's really really hard for me to debug and I'm not sure even what to look for.