Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Fix mkdir race condition in LooseObjectDB.store#91

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
Byron merged 1 commit intogitpython-developers:masterfromEliahKagan:toctou
Sep 11, 2023

Conversation

EliahKagan
Copy link
Member

Fixes#85

This replaces the conditional call toos.mkdir that raises an unintendedFileExistsError if the directory is created between the check and theos.mkdir call, using a singleos.makedirs call instead, withexist_ok=True.

This way, we attempt creation in a way that produces no error if the directory is already present, while still raisingFileExistsError if a non-directory filesystem entry (such as a regular file) is present where we want the directory to be. This is the advantage of this approach over the approach of swallowingFileExistError as suggested in#85.

Note, however, thatos.makedirs behaves likemkdir -p: it attempts to create parent directories (and their parents, etc.) if they do not already exist. So it should only be used if that is acceptable in this case. I am not aware of a reason it wouldn't be, but I am not very familiar with gitdb.

So that aspect of the situation deserves special consideration in reviewing this PR. I'd be pleased to change the approach ifos.makdirs is judged not suitable here. I think the approach suggested in#85 is reasonable, and it can be made more robust by checking that the directory exists after the creation attempt (or in other ways).

The code was under test: that line is exercised inTestExamples.test_base,TestGitDB.test_writing,TestLooseDB.test_basics, andTestObjDBPerformance.test_large_data_streaming. However, no test catches the race condition this fixes, andI have not added one.

Testing that the race condition does not occur in the specific way as before by accessing and calling the same functions as before in the same order would be easy, but it would be more of an illusion of a regression test than a useful test. Testing by trying to brute-force a race condition, without modifying the operation of the code for the test, would work but the tests would take a very long time to run. Testing it in a way that is fairly robust against new ways of reintroducing the race condition and that is not too slow should bepossible, but I don't know of a good way to do it; everything I've thought of would be complicated, and possibly make running the test in a debugger likepdb infeasible. So I have not added a regression test for this bug. However, if it is considered important to have one, then I can consider the matter further.

This replaces the conditional call to os.mkdir that raises anunintended FileExistsError if the directory is created between thecheck and the os.mkdir call, using a single os.makedirs callinstead, with exist_ok=True.This way, we attempt creation in a way that produces no error ifthe directory is already present, while still raisingFileExistsError if a non-directory filesystem entry (such as aregular file) is present where we want the directory to be.
@EliahKaganEliahKagan marked this pull request as ready for reviewSeptember 11, 2023 08:11
Copy link
Member

@ByronByron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks a lot!

Please do that I would prefer if GitDB could be removed from GitPython entirely, which is even worth a major version change to me. I dare to say that not too many people use it, and if they do, they should rather use an implementation that is more trustworthy (like the nativegit one). In hindsight, I do consider it a mistake to have tried to write an ODB in python.

With that said, it might be that parts of GitPython rely on GitDB, maybe for writing, and I'd hope these could be ported over to GitPython ease maintenance - I firmly believe that most of the code in GitDB isn't actually used by GitPython if the defaultgit ODB backend is used.

EliahKagan reacted with thumbs up emoji
@ByronByron merged commit295a55b intogitpython-developers:masterSep 11, 2023
@EliahKagan
Copy link
MemberAuthor

Thanks--if I contemplate significant changes to, or related to, gitdb, I'll being by looking into whether I can remove GitPython's dependence on it or vendor the specific parts that GitPython needs.

In this case, I just noticed that I was aware of a way to fix the race condition that might be acceptable. so I figured I'd open a PR. Although there are some other changes I may want to propose to the CI workflow in this repository, I'm not sure I really know enough to fix the most important gitdb-related issues, in GitPython or in gitdb itself. As my familiarity with the GitPython codebase increases, that might change.

Byron reacted with heart emoji

@EliahKaganEliahKagan deleted the toctou branchSeptember 11, 2023 08:55
@Byron
Copy link
Member

Although there are some other changes I may want to propose to the CI workflow in this repository, I'm not sure I really know enough to fix the most important gitdb-related issues, in GitPython or in gitdb itself. As my familiarity with the GitPython codebase increases, that might change.

I would absolutelylove if I could direct your incredible skill and energy away from the GitDB CI and towards making it unnecessary entirely :). It's probably more of a refactoring task, albeit a complex one.

As my familiarity with the GitPython codebase increases, that might change.

It's a great gift to have you contribute to this project, and I love the idea to have more of that. The project, to my mind, has significant issues with quality and some limitations stem from these. The greatest problems come from incorrect handling of encodings, both for paths and for data, along with many naive implementations of somegit data-structures (seeIndex).
With overhalf a billion downloads per year GitPython is more influential than I feel comfortable with given its current state, and your work will be majorly impactful.
I also hope your contribution experience is rewarding and pleasant so you can continue this awesome work.

@EliahKagan
Copy link
MemberAuthor

Thanks! Based on this, the only further gitdb CI pull request I will open now is one to re-add Python 3.7 support, as you indicated should be done in a few of the comments ingitpython-developers/GitPython#1654. (That will include CI, but also changing the lower bound back to 3.7 insetup.py. It's very simple, because it's just a revert of one commit.)

Byron reacted with thumbs up emoji

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@ByronByronByron approved these changes

Assignees
No one assigned
Labels
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

Race condition writing objects in LooseObjectDB
2 participants
@EliahKagan@Byron

[8]ページ先頭

©2009-2025 Movatter.jp