Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Add more checks for the validity of refnames#1672

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes fromall commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
50 changes: 48 additions & 2 deletionsgit/refs/symbolic.py
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -161,15 +161,61 @@ def dereference_recursive(cls, repo: "Repo", ref_path: Union[PathLike, None]) ->
return hexsha
# END recursive dereferencing

@staticmethod
def _check_ref_name_valid(ref_path: PathLike) -> None:
# Based on the rules described in https://git-scm.com/docs/git-check-ref-format/#_description
previous: Union[str, None] = None
one_before_previous: Union[str, None] = None
for c in str(ref_path):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Is there a way to avoid converting tostr? I assume this tries to decoderef_path with the current string encoding, which changes depending on the interpreter or user configuration and generally causes a lot of trouble.

Copy link
Member

@EliahKaganEliahKaganSep 22, 2023
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Unless this PR worsens that problem in some way, which I believe it does not, I would recommend it be fixed separately and later. The code this is replacing already had:

if".."instr(ref_path):
raiseValueError(f"Invalid reference '{ref_path}'")

But actually even that neither introduced nor exacerbated the problem. From the commit prior to#1644 being merged:

@classmethod
def_get_ref_info_helper(
cls,repo:"Repo",ref_path:Union[PathLike,None]
)->Union[Tuple[str,None],Tuple[None,str]]:
"""Return: (str(sha), str(target_ref_path)) if available, the sha the file at
rela_path points to, or None. target_ref_path is the reference we
point to, or None"""
tokens:Union[None,List[str],Tuple[str,str]]=None
repodir=_git_dir(repo,ref_path)
try:
withopen(os.path.join(repodir,str(ref_path)),"rt",encoding="UTF-8")asfp:

Note howstr(ref_path) was passed toos.path.join, which when givenstrs returns astr, thus astr was being passed toopen. Note also that, while thisstr call was actually redundant (os.path.join acceptspath-like objects since Python 3.6), evenit was not the cause ofstr and notbytes being used. The annotation onref_path isUnion[PathLike, None], wherePathLike is:

PathLike=Union[str,"os.PathLike[str]"]

Where both alternatives--str andos.PathLike[str]--represent text that has already been decoded.

So unless I'm missing something--which I admit I could be--I don't think it makes conceptual sense to do anything about that in this pull request. Furthermore, unless the judgment thatCVE-2023-41040 was a security vulnerability was mistaken, or something about the variation explicated in#1644 (comment) is less exploitable, it seems to me that this pull request is fixing a vulnerability. Assuming that is the case, then I think this should avoid attempting to make far-reaching changes beyond those that pertain to the vulnerability, and that although reviewing these changes for correctnessshouldnot be rushed, other kinds of delays should be mostly avoided. With good regression tests included, as seems to be the case, the code could be improved on later in other ways.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks a lot for the thorough assessment, I wholeheartedly agree.

The 'how to handle paths correctly' issue is definitely one of the big breaking points in GitPython, but maybe, for other reasons, this wasn't ever a problem here.

Knowing this is on your radar, maybe one day there will be a solution to it.gitoxide already solves this problem, but it's easier when you have an actual type system and a standard library that makes you aware every step of the way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Is the ultimate goal to support bothstr-based andbytes-based ref names and paths?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The goal is correctness, and it's vital that one doesn't try to decode paths to fit some interpreter-controlled encoding. Paths are paths, and if you are lucky, they can be turned into bytes. On Unix, that's always possible and a no-op, but on windows it may require a conversion. It's just the question how these things are supposed to work in python.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Does this relate (conceptually, I mean) to the issue inrust-lang/rust#12056?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

A great find :) - yes, that's absolutely related.gitoxide internally handles git-paths as bundles of bytes without known encoding, and just likegit, it assumes at least ASCII. Conversions do happen but they all go throughgix-path to have a central place for it.

Doing something like it would be needed here as well, even though I argue that before that happens universally, there should be some clear definition of what GitPython is supposed to be.

When I took it over by contributing massively, just like you do now, I needed more control for the use-case I had in mind, and started implementing all these sloppy pure-python components that don't even get the basics right. With that I turned GitPython into some strange hybrid which I think didn't do it any good besides maybe being a little faster forsome usecases. After all, manipulating an index in memory has advantages, but there are also other ways to do it while relying ongit entirely.

Maybe this is thinking a step too far, but I strongly believe that the true benefit of GitPython is to be able to callgit in a simple manner and to be compliant naturally due to usinggit directly. Thisshould be its identity.

But then again, it's worth recognizing that changing the various pure-python implementations to usgit under the hood probably isn't possible in a non-breaking way.

Another avenue would be to try and get the APIs to use types that don't suffer from encoding/decoding issues related to Paths, and then one day make the jump to replacing the pure-python implementations with the python bindings ofgitoxide.

if c in " ~^:?*[\\":
raise ValueError(
f"Invalid reference '{ref_path}': references cannot contain spaces, tildes (~), carets (^),"
f" colons (:), question marks (?), asterisks (*), open brackets ([) or backslashes (\\)"
)
elif c == ".":
if previous is None or previous == "/":
raise ValueError(
f"Invalid reference '{ref_path}': references cannot start with a period (.) or contain '/.'"
)
elif previous == ".":
raise ValueError(f"Invalid reference '{ref_path}': references cannot contain '..'")
elif c == "/":
if previous == "/":
raise ValueError(f"Invalid reference '{ref_path}': references cannot contain '//'")
elif previous is None:
raise ValueError(
f"Invalid reference '{ref_path}': references cannot start with forward slashes '/'"
)
elif c == "{" and previous == "@":
raise ValueError(f"Invalid reference '{ref_path}': references cannot contain '@{{'")
elif ord(c) < 32 or ord(c) == 127:
raise ValueError(f"Invalid reference '{ref_path}': references cannot contain ASCII control characters")

one_before_previous = previous
previous = c

if previous == ".":
raise ValueError(f"Invalid reference '{ref_path}': references cannot end with a period (.)")
elif previous == "/":
raise ValueError(f"Invalid reference '{ref_path}': references cannot end with a forward slash (/)")
elif previous == "@" and one_before_previous is None:
raise ValueError(f"Invalid reference '{ref_path}': references cannot be '@'")
elif any([component.endswith(".lock") for component in str(ref_path).split("/")]):
raise ValueError(
f"Invalid reference '{ref_path}': references cannot have slash-separated components that end with"
f" '.lock'"
)

@classmethod
def _get_ref_info_helper(
cls, repo: "Repo", ref_path: Union[PathLike, None]
) -> Union[Tuple[str, None], Tuple[None, str]]:
"""Return: (str(sha), str(target_ref_path)) if available, the sha the file at
rela_path points to, or None. target_ref_path is the reference we
point to, or None"""
if ".." in str(ref_path):
raise ValueError(f"Invalid reference '{ref_path}'")
if ref_path:
cls._check_ref_name_valid(ref_path)

tokens: Union[None, List[str], Tuple[str, str]] = None
repodir = _git_dir(repo, ref_path)
try:
Expand Down
36 changes: 36 additions & 0 deletionstest/test_refs.py
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -631,3 +631,39 @@ def test_refs_outside_repo(self):
ref_file.flush()
ref_file_name = Path(ref_file.name).name
self.assertRaises(BadName, self.rorepo.commit, f"../../{ref_file_name}")

def test_validity_ref_names(self):
check_ref = SymbolicReference._check_ref_name_valid
# Based on the rules specified in https://git-scm.com/docs/git-check-ref-format/#_description
# Rule 1
self.assertRaises(ValueError, check_ref, ".ref/begins/with/dot")
self.assertRaises(ValueError, check_ref, "ref/component/.begins/with/dot")
self.assertRaises(ValueError, check_ref, "ref/ends/with/a.lock")
self.assertRaises(ValueError, check_ref, "ref/component/ends.lock/with/period_lock")
# Rule 2
check_ref("valid_one_level_refname")
# Rule 3
self.assertRaises(ValueError, check_ref, "ref/contains/../double/period")
# Rule 4
for c in " ~^:":
self.assertRaises(ValueError, check_ref, f"ref/contains/invalid{c}/character")
for code in range(0, 32):
self.assertRaises(ValueError, check_ref, f"ref/contains/invalid{chr(code)}/ASCII/control_character")
self.assertRaises(ValueError, check_ref, f"ref/contains/invalid{chr(127)}/ASCII/control_character")
# Rule 5
for c in "*?[":
self.assertRaises(ValueError, check_ref, f"ref/contains/invalid{c}/character")
# Rule 6
self.assertRaises(ValueError, check_ref, "/ref/begins/with/slash")
self.assertRaises(ValueError, check_ref, "ref/ends/with/slash/")
self.assertRaises(ValueError, check_ref, "ref/contains//double/slash/")
# Rule 7
self.assertRaises(ValueError, check_ref, "ref/ends/with/dot.")
# Rule 8
self.assertRaises(ValueError, check_ref, "ref/contains@{/at_brace")
# Rule 9
self.assertRaises(ValueError, check_ref, "@")
# Rule 10
self.assertRaises(ValueError, check_ref, "ref/contain\\s/backslash")
# Valid reference name should not raise
check_ref("valid/ref/name")

[8]ページ先頭

©2009-2025 Movatter.jp