Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork938
Add more checks for the validity of refnames#1672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Changes fromall commits
File filter
Filter by extension
Conversations
Uh oh!
There was an error while loading.Please reload this page.
Jump to
Uh oh!
There was an error while loading.Please reload this page.
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -161,15 +161,61 @@ def dereference_recursive(cls, repo: "Repo", ref_path: Union[PathLike, None]) -> | ||||||||||||||||||||||||||||||
return hexsha | ||||||||||||||||||||||||||||||
# END recursive dereferencing | ||||||||||||||||||||||||||||||
@staticmethod | ||||||||||||||||||||||||||||||
def _check_ref_name_valid(ref_path: PathLike) -> None: | ||||||||||||||||||||||||||||||
# Based on the rules described in https://git-scm.com/docs/git-check-ref-format/#_description | ||||||||||||||||||||||||||||||
previous: Union[str, None] = None | ||||||||||||||||||||||||||||||
one_before_previous: Union[str, None] = None | ||||||||||||||||||||||||||||||
for c in str(ref_path): | ||||||||||||||||||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others.Learn more. Is there a way to avoid converting to Member
|
if".."instr(ref_path): | |
raiseValueError(f"Invalid reference '{ref_path}'") |
But actually even that neither introduced nor exacerbated the problem. From the commit prior to#1644 being merged:
GitPython/git/refs/symbolic.py
Lines 164 to 174 in830025b
@classmethod | |
def_get_ref_info_helper( | |
cls,repo:"Repo",ref_path:Union[PathLike,None] | |
)->Union[Tuple[str,None],Tuple[None,str]]: | |
"""Return: (str(sha), str(target_ref_path)) if available, the sha the file at | |
rela_path points to, or None. target_ref_path is the reference we | |
point to, or None""" | |
tokens:Union[None,List[str],Tuple[str,str]]=None | |
repodir=_git_dir(repo,ref_path) | |
try: | |
withopen(os.path.join(repodir,str(ref_path)),"rt",encoding="UTF-8")asfp: |
Note howstr(ref_path)
was passed toos.path.join
, which when givenstr
s returns astr
, thus astr
was being passed toopen
. Note also that, while thisstr
call was actually redundant (os.path.join
acceptspath-like objects since Python 3.6), evenit was not the cause ofstr
and notbytes
being used. The annotation onref_path
isUnion[PathLike, None]
, wherePathLike
is:
Line 43 in830025b
PathLike=Union[str,"os.PathLike[str]"] |
Where both alternatives--str
andos.PathLike[str]
--represent text that has already been decoded.
So unless I'm missing something--which I admit I could be--I don't think it makes conceptual sense to do anything about that in this pull request. Furthermore, unless the judgment thatCVE-2023-41040 was a security vulnerability was mistaken, or something about the variation explicated in#1644 (comment) is less exploitable, it seems to me that this pull request is fixing a vulnerability. Assuming that is the case, then I think this should avoid attempting to make far-reaching changes beyond those that pertain to the vulnerability, and that although reviewing these changes for correctnessshouldnot be rushed, other kinds of delays should be mostly avoided. With good regression tests included, as seems to be the case, the code could be improved on later in other ways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thanks a lot for the thorough assessment, I wholeheartedly agree.
The 'how to handle paths correctly' issue is definitely one of the big breaking points in GitPython, but maybe, for other reasons, this wasn't ever a problem here.
Knowing this is on your radar, maybe one day there will be a solution to it.gitoxide
already solves this problem, but it's easier when you have an actual type system and a standard library that makes you aware every step of the way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Is the ultimate goal to support bothstr
-based andbytes
-based ref names and paths?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
The goal is correctness, and it's vital that one doesn't try to decode paths to fit some interpreter-controlled encoding. Paths are paths, and if you are lucky, they can be turned into bytes. On Unix, that's always possible and a no-op, but on windows it may require a conversion. It's just the question how these things are supposed to work in python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Does this relate (conceptually, I mean) to the issue inrust-lang/rust#12056?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
A great find :) - yes, that's absolutely related.gitoxide
internally handles git-paths as bundles of bytes without known encoding, and just likegit
, it assumes at least ASCII. Conversions do happen but they all go throughgix-path
to have a central place for it.
Doing something like it would be needed here as well, even though I argue that before that happens universally, there should be some clear definition of what GitPython is supposed to be.
When I took it over by contributing massively, just like you do now, I needed more control for the use-case I had in mind, and started implementing all these sloppy pure-python components that don't even get the basics right. With that I turned GitPython into some strange hybrid which I think didn't do it any good besides maybe being a little faster forsome usecases. After all, manipulating an index in memory has advantages, but there are also other ways to do it while relying ongit
entirely.
Maybe this is thinking a step too far, but I strongly believe that the true benefit of GitPython is to be able to callgit
in a simple manner and to be compliant naturally due to usinggit
directly. Thisshould be its identity.
But then again, it's worth recognizing that changing the various pure-python implementations to usgit
under the hood probably isn't possible in a non-breaking way.
Another avenue would be to try and get the APIs to use types that don't suffer from encoding/decoding issues related to Paths, and then one day make the jump to replacing the pure-python implementations with the python bindings ofgitoxide
.