Thepacked-refs file can contain ref names that are not valid UTF-8 (e.g., Latin-1 encoded tag names created by older Git versions or systems with different locale settings). Previously, GitPython would fail withUnicodeDecodeError when reading such files.

Reproduction

As described in#2064:

git clone https://github.com/ACRA/acracd acrapython -c'import git; print(git.Repo(".").tags)'

Before fix:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 6216: invalid continuation byte

After fix: Successfully reads all 101 tags.

Changes

Adderrors='surrogateescape' to theopen() call in_iter_packed_refs()
This allows reading files with arbitrary byte sequences while preserving valid UTF-8 as text
Add test that verifies non-UTF8 packed-refs can be read successfully

Technical Details

Thesurrogateescape error handler is Python's standard approach for handling potentially non-UTF8 data in filesystem operations. It:

Passes through valid UTF-8 unchanged
Converts invalid byte sequences to Unicode surrogate characters (\uDC80-\uDCFF)
Preserves the original bytes in a reversible way (can be re-encoded back to original bytes)

This is the same approach used by Python'sos.fsdecode() and is recommended for filesystem operations where encoding may be unknown or mixed.

Fix UnicodeDecodeError when reading packed-refs with non-UTF8 characters

40af3b3

Fixesgitpython-developers#2064The packed-refs file can contain ref names that are not valid UTF-8(e.g., Latin-1 encoded tag names created by older Git versions ornon-UTF8 systems). Previously, opening the file with encoding='UTF-8'would raise UnicodeDecodeError.Changes:- Add errors='surrogateescape' to the open() call in _iter_packed_refs()- This allows reading files with arbitrary byte sequences while still  treating valid UTF-8 as text- Add test that verifies non-UTF8 packed-refs can be read successfullyThe 'surrogateescape' error handler is the standard Python approach forhandling potentially non-UTF8 data in filesystem operations, as itpreserves the original bytes in a reversible way.

Byron requested a review fromCopilot

December 7, 2025 15:52

Copilotstarted reviewing on behalf ofByron

December 7, 2025 15:53

View session

Byron marked this pull request as draft

December 7, 2025 15:53

CopilotAI reviewed

Dec 7, 2025

View reviewed changes

Copy link

Contributor

CopilotAI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Pull request overview

This PR fixes aUnicodeDecodeError that occurred when GitPython attempted to readpacked-refs files containing ref names encoded with non-UTF-8 character encodings (e.g., Latin-1 encoded tag names from older Git versions). The fix uses Python'ssurrogateescape error handler, which is the standard approach for handling filesystem operations with potentially mixed or unknown encodings.

Key changes:

Addserrors='surrogateescape' parameter to file reading in_iter_packed_refs() method
Adds comprehensive test that reproduces and verifies the fix for the Unicode decoding issue

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File	Description
git/refs/symbolic.py	Adds`errors='surrogateescape'` to the packed-refs file reader to handle non-UTF8 encoded ref names gracefully
test/test_refs.py	Adds test case that creates a packed-refs file with Latin-1 encoded ref name and verifies it can be read without errors

💡Add Copilot custom instructions for smarter, more guided reviews.Learn how to get started.

MirrorDNA-Reflection-Protocol added2 commits

December 8, 2025 13:58

Fix ruff lint: remove whitespace from blank lines

963604b

Fix codespell: rename test ref to avoid 'caf' typo detection

5e5e1c1

Labels

None yet

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fix UnicodeDecodeError when reading packed-refs with non-UTF8 characters#2091

Are you sure you want to change the base?

Fix UnicodeDecodeError when reading packed-refs with non-UTF8 characters#2091

Conversation

MirrorDNA-Reflection-Protocol commentedDec 7, 2025

Summary

Reproduction

Changes

Technical Details

Uh oh!

CopilotAI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

1 participant