Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Potential Race Condition Fix - OS Rename & Chmod - PermissionError#115

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
Byron merged 2 commits intogitpython-developers:masterfromDEKHTIARJonathan:patch-1
Jan 2, 2025
Merged

Conversation

DEKHTIARJonathan
Copy link
Contributor

@DEKHTIARJonathanDEKHTIARJonathan commentedDec 23, 2024
edited
Loading

There is a really flaky and hard to debug bug when usinggitdb at least on MacOS (M-Processor) using Docker (docker.io/python:3.13 image).

I have only been able to reproduce this error withinpytest on my laptop (Apple M4 laptop).

I completely randomly get:

repo=Repo(........)# Repo Created using `GitPython`[...]repo.index.add("*")~~~~~~~~~~~~~~^^^^^File"/usr/local/lib/python3.13/site-packages/git/index/base.py",line885,inaddentries_added.extend(self._entries_for_paths(paths,path_rewriter,fprogress,entries))~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^File"/usr/local/lib/python3.13/site-packages/git/util.py",line176,inwrapperreturnfunc(self,*args,**kwargs)File"/usr/local/lib/python3.13/site-packages/git/index/util.py",line111,inset_git_working_dirreturnfunc(self,*args,**kwargs)File"/usr/local/lib/python3.13/site-packages/git/index/base.py",line745,in_entries_for_pathsentries_added.append(self._store_path(filepath,fprogress))~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^File"/usr/local/lib/python3.13/site-packages/git/index/base.py",line698,in_store_pathistream=self.repo.odb.store(IStream(Blob.type,st.st_size,stream))File"/usr/local/lib/python3.13/site-packages/gitdb/db/loose.py",line233,instorechmod(obj_path,self.new_objects_mode)~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^PermissionError: [Errno1]Operationnotpermitted:'/git_projects/<GIT_PROJECT_NAME>/.git/objects/9d/961f54d04e15cef0e525d1cb3d937bd7b82b04'

When you inspect a little bit thatobj_path - everything looks absolutely fine as far as I can tell. I have no idea what could trigger thisPermissionError. The script is executing under the userdev-user:dev-user of UID/GID:1000:1000

os.getuid()=1000pwd.getpwuid(os.getuid())=pwd.struct_passwd(pw_name='dev-user',pw_passwd='x',pw_uid=1000,pw_gid=1000,pw_gecos='',pw_dir='/home/dev-user',pw_shell='/bin/bash')os.stat(obj_path)=os.stat_result(st_mode=33152,st_ino=3713822,st_dev=65025,st_nlink=1,st_uid=1000,st_gid=1000,st_size=473,st_atime=1734989354,st_mtime=1734989354,st_ctime=1734989354)Path(obj_path).exists()=TruePath(obj_path).owner()='dev-user'Path(obj_path).group()='dev-user'

So clearly something is not going well.

Looking a little bit more in detail, turns out the error is always happening in scenario B - but not systematically in scenario B:

ifisfile(obj_path):remove(tmp_path)print("SCENARIO A !!!!!!!!!!!!!!!")else:rename(tmp_path,obj_path)print("SCENARIO B !!!!!!!!!!!!!!!")

There might be an interference betweentempfile.mkstemp,os.rename andos.chmod. Not sure which one.

It seems like a retry loop with a small temporization seems to mitigate the problem.

One supposition is thatos.rename might not entirely complete by the time it returns. Or some file descriptors might be corrupted. Unfortunately, it's really really hard for me to debug and I'm not sure even what to look for.

EliahKagan reacted with thumbs up emoji
@DEKHTIARJonathan
Copy link
ContributorAuthor

DEKHTIARJonathan commentedDec 24, 2024
edited
Loading

After further investigation - my assumption was correct.

After therename(tmp_path, obj_path) operation, the file atobj_path might not be fully committed to disk immediately. If chmod is called before the system fully registers obj_path as available, a PermissionError may occur.

There's multiple ways to fix it.

  1. A small loop as I did with a sleep() around the chmod.
  2. A small loop calling onos.stat(obj_path) which will trigger aOSError if not yet ready => sleep and re-loop.

I think solution 1 is a little cleaner

@DEKHTIARJonathanDEKHTIARJonathan changed the title[Mac OS] PermissionError FixPotential Race Condition Fix - OS Rename & Chmod - PermissionErrorDec 24, 2024
Copy link
Member

@ByronByron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks for tackling this.

It's strange that GitPython is still using this backend for storing files, I thought it's all done by the Git (CLI) based backend.

The python implementation is pretty slow and I definitely wouldn't want to trust it, so it makes me sad it's still in useand gets to meddle with your work.

@DEKHTIARJonathan
Copy link
ContributorAuthor

DEKHTIARJonathan commentedDec 24, 2024
edited
Loading

@Byron yeah I'm not so sure TBH.

Do you mind publishing apatch level release of gitdb once you merge this PR ? That would be helpful so that I don't need to instal from github separately.

I'm honestly not sure why they usegitdb, this bug has been quite difficult to identify (fix was trivial though).

Changes requested applied. Sorry I'm not able to produce a repro case. It's actually very difficult to reproduce (as most race conditions are)

Copy link
Member

@ByronByron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks! There are a couple of nits that should probably be fixed before merging.

It comes to mind that ingitoxide tempfiles are created with the correct mode right away. This is also how Git does it, and a way to completely avoid the issue here. In Python that doesn't seem quite as straightforward, but I wanted to mention it in case you know how.

DEKHTIARJonathan reacted with thumbs up emoji
@DEKHTIARJonathan
Copy link
ContributorAuthor

DEKHTIARJonathan commentedDec 26, 2024
edited
Loading

Thanks! There are a couple of nits that should probably be fixed before merging.

It comes to mind that ingitoxide tempfiles are created with the correct mode right away. This is also how Git does it, and a way to completely avoid the issue here. In Python that doesn't seem quite as straightforward, but I wanted to mention it in case you know how.

Yeap honestly I don't think I know of a way to do this.

Question 1: Do you actually need to set the permissions forobj_path in the case it's indeed a file ?

This case:

obj_path=self.db_path(self.object_path(hexsha))obj_dir=dirname(obj_path)os.makedirs(obj_dir,exist_ok=True)# END handle destination directory# rename onto existing doesn't work on NTFSifisfile(obj_path):remove(tmp_path)

If not then setting the permission only in the "rename" case and before the actual rename will entirely fix the problem.

Copy link
Member

@ByronByron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yes, great catch, that's true! It would be better to do that only in the rename case.

Besides that, and beyond this PR, the logic there could use a cleanup, as it really does object writes in the slowest possible way. In order to not slow to a crawl objects need to be written and hashed to memory. Then one won't write the object at all if it already exists in the database.

Co-authored-by: Sebastian Thiel <sebastian.thiel@icloud.com>
@DEKHTIARJonathan
Copy link
ContributorAuthor

@Byron Happy new year ;)

I'm not 100% sure of your comment. Do you confirm thechmod is only necessary in the rename case. Are you 100% sure of it ?

So far the implementation we have is pretty "risk-free" it's just a retry loop. If I only do it in the rename case, we are now changing the logic and I dont have the knowledge/expertize to know if it's safe. So you tell me

Copy link
Member

@ByronByron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Independently of what might be right, this change seems to be the smallest possible change to fix this particular issue you have been encountering, so I think it's good to merge.

I will see to create a new release, too.

@ByronByron merged commit38b5c38 intogitpython-developers:masterJan 2, 2025
7 checks passed
@DEKHTIARJonathan
Copy link
ContributorAuthor

Amazing ! Thanks a lot@Byron

Copy link
Member

@EliahKaganEliahKagan left a comment
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

(This is only posted as a review because that lets me include a comment on a specific part of the changed code. This pull request has already been merged, and rightly so. You are under no obligation to reply!)

After therename(tmp_path, obj_path) operation, the file atobj_path might not be fully committed to disk immediately. If chmod is called before the system fully registers obj_path as available, a PermissionError may occur.

I'm curious if this is specific to bind mounts to the outside filesystem in Docker Desktop on macOS, or if it is broader. Is this related todocker/for-mac#6812? I ask because:

  1. Not being able to operate on the target ofos.rename immediately after it returns successfully is unintuitive, and if this is not a bug in Docker, then maybe it should be documented somewhere. If it is in practice specific toos.rename, then it could be documented for that. If it is already documented, we could add a link tothe comment here.

  2. Maybe the delays could be made specific to that situation, so as to avoid waiting before reporting otherPermissionErrors in most circumstances.

  3. If creating a file with the desired permissions initially, as suggested in#115 (review), would work, then I think it would be worthwhile. But I worry that some ways of doing this mightalso be limited by whatever limitations apply to the filesystem when using containers on macOS.

  4. If the file cannot be created with the desired permissions initially, I wonder if it might still be possible to create it in a way that would allow the desired permissions to be set without delay. The operation that was not permitted includes accessincreases:msktemp creates files with 0600 permissions, but they are then usually changed to 0444. If permissions are changed in a way that onlydecreases access, by usually creating the file with 0644 permissions and then changing them to 0444, would the change still have to wait? It seems odd to think those operations might have different constraints, butdocker/for-mac#6812 makes me think that might be so.

    Even if we need to avoid path collisions before the file is moved into place as robustly asmkstemp does, we can still create the file initially with less restrictive permissions, by usingmkdtemp orTemporaryDirectory to make a temporary directory, and then creating the file with a predictable name inside that directory.

My motivation in bringing up (2), (3), and (4) is that I worry that, for some callers, the delays could accumulate. Callers should probably not callstore in a loop without stopping on errors, so ideally the added delay shouldn't exceed one second. But if a program makes independent attempts to store many objects and then reports errors, then there could be a long delay ifPermissionError is being raised for other reasons such as the user not having enough access to the repository. In principle this can already happen on Windows because of the retry logic forremove, but as far as I know it has not been expected on other systems.

Comment on lines +232 to +236
# Ensure rename is actually done and file is stable
# Retry up to 14 times - exponential wait & retry in ms.
# The total maximum wait time is 1000ms, which should be vastly enough for the
# OS to return and commit the file to disk.
for exp_backoff_ms in [1, 4, 9, 16, 25, 36, 49, 64, 81, 100, 121, 144, 169, 181]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

These timings seem reasonable, but I believe this is quadratic backoff rather than exponential backoff. (In exponential backoff, delays are a fixed base, often 2, raised to the power of a number that increases by one with each attempt. Here, the number that increases by one with each attempt is the base, raised to a fixed power of 2.) I mention this only in case true exponential backoff is intended or required.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

That is very true! Since I copied these values fromgitoxide whereI implemented that based on what Git does. So most definitely it was me misnaming this, as Gitcorrectly calls it quadratic.

A PR would definitely be welcome that fixes the name ingitoxide.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I've openedGitoxideLabs/gitoxide#1815 to fix that ingitoxide. While I was at it, I also opened#119 to make the much more minor correction here.

Byron reacted with thumbs up emoji
@Byron
Copy link
Member

Byron commentedJan 5, 2025
edited
Loading

4. Even if we need to avoid path collisions before the file is moved into place as robustly asmkstemp does, we can still create the file initially with less restrictive permissions, by usingmkdtemp orTemporaryDirectory to make a temporary directory, then and creating the file with a predictable name inside that directory.

This wold indeed be very interesting to find out!@DEKHTIARJonathan would be the one uniquely positioned to do that if time permits.

EliahKagan added a commit to EliahKagan/gitdb that referenced this pull requestJan 26, 2025
The sequence of backoff wait times used in `gitdb.db.loose` isquadratic rather than exponential, as discussed in:gitpython-developers#115 (comment)This corrects the variable name by making it more general, and thecomment by having it explicitly describe the backoff as quadratic.This is conceptually related toGitoxideLabs/gitoxide#1815, butthis is a non-breaking change, as no interfaces are affected: onlya local variable and comment.
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@ByronByronByron approved these changes

@EliahKaganEliahKaganEliahKagan left review comments

Assignees
No one assigned
Labels
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

3 participants
@DEKHTIARJonathan@Byron@EliahKagan

[8]ページ先頭

©2009-2025 Movatter.jp