Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork34.2k
gh-106939, gh-145261: Fix ShareableList data corruption#145488
Open
jakelodwick wants to merge 1 commit intopython:mainfrom
Open
gh-106939, gh-145261: Fix ShareableList data corruption#145488jakelodwick wants to merge 1 commit intopython:mainfrom
jakelodwick wants to merge 1 commit intopython:mainfrom
Conversation
Store actual byte lengths in format metadata instead of allocatedslot sizes, so retrieval extracts exact data without relying onnull-termination. Use byte count instead of character count forstr slot allocation to prevent multi-byte UTF-8 overflow.
The following commit authors need to sign the Contributor License Agreement: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading.Please reload this page.
ShareableListhas two data corruption bugs, both rooted in the same design flaw: C-style null-terminated storage semantics applied to length-delimited Python types.Bug 1 — UTF-8 underallocation (#145261,#88336): Slot allocation uses
len(item)(character count) instead of byte count forstritems. Multi-byte UTF-8 strings overflow their allocated slot and corrupt adjacent data.Bug 2 — Null stripping (#106939,#96779): The back-transform lambdas call
rstrip(b'\x00')to remove struct padding, but this also strips legitimate trailing null bytes from user data.Approach
Store actual byte lengths in the format metadata (separate from the allocated slot sizes used for
struct.pack_into), and use those exact lengths during retrieval instead of relying on null-termination. This makes both bugs go away with a single conceptual change.Specifically:
__init__: compute slot allocation usinglen(item.encode('utf-8'))forstritems__init__: build_stored_formatslist with actual byte lengths, write those to packing metadata__setitem__: separatepack_format(allocated slot size) fromnew_format(actual byte length in metadata)_back_transforms_mapping: removerstrip(b'\x00')Cross-version compatibility
8sfor a 5-byte string in an 8-byte slot). New code reads that8sformat and returns 8 bytes including padding nulls. This is the same behavior as the current release — no regression.5s). Old code reads5s, gets 5 bytes, thenrstrip(b'\x00')— harmless unless the data actually ends in nulls, which is the existing bug.Tests
test_shared_memory_ShareableList_trailing_nulls: bytes with trailing nulls, str with trailing nulls, all-null bytes, empty bytes, no-null bytes, cross-process read vianame=test_shared_memory_ShareableList_multibyte_utf8: 1-byte (ASCII), 2-byte (é), 3-byte (中), and 4-byte (𐀀) UTF-8 sequences with cross-process verificationsl.formatassertions to reflect actual byte lengthsPrior work
This PR consolidates the approaches from#144559 (@aisk, null-stripping fix) and#145266 (@zetzschest, both fixes). Both PRs are open with zero reviews. The fixes belong together because they share the same root cause and the same solution mechanism (stored byte lengths in format metadata).
📚 Documentation preview 📚:https://cpython-previews--145488.org.readthedocs.build/