Uh oh!
There was an error while loading.Please reload this page.
- Notifications
You must be signed in to change notification settings - Fork34k
gh-140557: Force alignment of emptybytearray andarray.array buffers#140559
gh-140557: Force alignment of emptybytearray andarray.array buffers#140559encukou merged 11 commits intopython:mainfrom
bytearray andarray.array buffers#140559Conversation
This ensures the buffers used by the empty `bytearray` and `array.array`are aligned the same as a pointer returned by the allocator. This is amore convenient default for interop with other languages that havestricter requirements of type-safe buffers (e.g. Rust's `&[T]` type)even when empty.
python-cla-botbot commentedOct 24, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
serhiy-storchaka left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
LGTM. 👍
cmaloney commentedOct 26, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I'm 👎 on this for see:gh-139871 for ways having a |
cmaloney commentedOct 26, 2025
re: |
jakelishman commentedOct 26, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
This doesn't enforce that every buffer always has aligned access - you can easily take out a view of a bytes or bytearray that you offset by a byte - so other languages have to still handle the case of an unaligned pointer. It just makes the default empty object aligned at zero cost, and that object is common. The empty |
jakelishman commentedOct 26, 2025
We do still have to handle unaligned access everywhere, I agree there's no escaping it. The goal here is only around making unaligned objects less common - |
cmaloney commentedOct 26, 2025
I'm still concerned here: Both
|
jakelishman commentedOct 26, 2025
I'm not aiming to make this a guarantee at all, just that by default the zero allocation case is already aligned. In Rust and other languages we do still have to handle unaligned buffers, just like Numpy alread has to in pure C.
Right, this is where other languages have stronger guarantees - I mentioned it ingh-140557 as the motivation that in Rust, creating a |
jakelishman commentedOct 26, 2025
About rarity of appearance: both of these pointers show through the buffer protocol like you mentioned, and that's where I come across them in language interop - that's the defined way to get zero-copy access to a data buffer owned by Python space. We can't have zero-copy access to misaligned buffers, but in practice, the vast majority of buffers thatare back by an allocation end up aligned anyway, so requiring copies is rare (since you have to deliberately offset an allocated pointer by a sub-unit amount). We can add special handling to produce an empty slice in Rust that doesn't refer to the same pointer we actually get from the Python buffer protocol, if it's misaligned as an extra optimisation to avoid a copy, so we don'trequire CPython support, but if the default is actually aligned, then there's less point propagating this hypothetical optimising special-handling code through a lot of downstream packages and just using the slower "copy to force alignment" paths that (should) already exist for it. When I wrote this patch, it was zero cost to CPython to achieve that for the defaults. If you have additional work that would seriously raise the cost to CPython then the calculus is different, though even reliably having the empty buffer aligned on an 8, like the empty |
cmaloney commentedOct 26, 2025
Can you link to the code or provide a rust sample which needs to special-case handling a zero-length bytearray? I think that would help me understand here. |
serhiy-storchaka commentedOct 27, 2025
It is not guaranteed that the start of the bytearray buffer has some alignment. This is a CPython implementation detail. But some code depends on this, and it may not work on non-x86 platforms if this is not aligned. There are exceptions: if there was a deletion from the beginning of the bytearray (we cannot do anything with this, but the user can guarantee that this did not happen), and when the bytearray is empty. The latter case is easy to fix for us, it costs nothing. |
jakelishman commentedOct 27, 2025
cmaloney: Let's say I've got FFI got that wraps the structPyBuffer{buf:*mut(),itemsize:isize,ndim:::std::ffi::c_int,shape:*constisize,strides:*constisize,}implPyBuffer{/// Is the slice contiguous in memory?fnis_contiguous(&self) ->bool{/* ... */}/// How many bytes can be read from it?fnlen_bytes(&self) ->usize{/* ... */}} Let's say I've got one of these structs that I then initialised with enumSliceError{Unaligned,Noncontiguous,Nullptr,}fnslice_from_buffer<T>(buf:&PyBuffer) ->Result<&[T],SliceError>{if buf.buf.is_null(){returnErr(SliceError::Nullptr);}if !buf.is_contiguous(){returnErr(SliceError::Noncontiguous);}if !buf.buf.is_aligned::<T>(){returnErr(SliceError::Unaligned);}// SAFETY: pointer is non-null, aligned, and valid for this many contiguous reads:Ok(unsafe{ std::slice::from_raw_parts( buf.buf, buf.len_bytes() / std::mem::size_of::<T>()})} I have to do the alignment and contiguous checks before I call A Rust caller of this function is still responsible for handling the case of an unaligned pointer, which might cause them to do something like matchslice_from_buffer::<u64>(&buf){Ok(slice) =>use_slice(slice),Err(SliceError::Noncontiguous |SliceError::Unaligned) =>{let aligned_contiguous =/* copy buffer to somewhere aligned */;use_slice(aligned_contiguous.as_slice())}Err(SliceError::Nullptr) =>panic!(),} The idea of this PR is just to make it so that slightly more stuff by default gets to go down the happy path, in a way that doesn't cost CPython anything. It's still the Rust user's responsibility to handle the unhappy path since that's totally valid Python code still, and the Rust library's responsibility to make sure everything is safe for FFI use in Rust2. This PR isn't intending to add any restrictions on what CPython is allowed to do or what other Python implementations may do. Your#139871 looks to me to also achieve the same goal I was going for here, just as a side effect (since the empty Footnotes
|
cmaloney commentedOct 27, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
I'm 👍 for doing
|
cmaloney left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Reduce scope to justarray.array + add a test; empty buffer + non-emptyarray.array storage being aligned I think is a nice improvement
serhiy-storchaka commentedOct 29, 2025
I do not think that bytearray should be excluded. If not a special optimization of using NULL for empty array, it would have standard alignment (as returned by |
cmaloney commentedOct 29, 2025
serhiy-storchaka commentedOct 29, 2025
Then I do not see reasons to object this change. |
cmaloney commentedOct 29, 2025
I have slight nit in wording of the NEWS entry; I think it would be good to mention Rust and to make it more concise. |
serhiy-storchaka commentedOct 29, 2025
I think there is a good chance that the code that relies on this already exist. It works only because the Intel plathform is tolerable to unaligned access (and there is no actually access to memory here, because it is an empty buffer). But on other platforms pointers of different type can use different registers. This is not limited to Rust. In C, casting a pointer with wrong alignment can be an undefined behavior. If bytesarray is used instead of array as a collection of 16-, 32- or 64-bit integers, there may be problems. |
719e0e2 todd9e50bCompare
serhiy-storchaka left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I think the cost of the tests exceeds their value. They are complex and fragile.
If you insist on adding tests, they can be implemented by adding a simple helper in_testcapi. The tests should be decorated as CPython-only. They should also test non-empty arrays of different types.
jakelishman commentedJan 6, 2026
Thanks for the ping, and happy holidays everybody. I reverted the old fragile tests and added (my best attempt at) the new |
jakelishman commentedJan 6, 2026
Ah, the test failure here is because the handling of the Shall I attempt to modify the base |
jakelishman commentedJan 6, 2026
I don't think it's practical/possible to force alignment of the empty I've changed the test case to explicit check for |
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Misc/NEWS.d/next/Core_and_Builtins/2025-10-24-17-30-51.gh-issue-140557.X2GETk.rst OutdatedShow resolvedHide resolved
Uh oh!
There was an error while loading.Please reload this page.
Misc/NEWS.d/next/Core_and_Builtins/2025-10-24-17-30-51.gh-issue-140557.X2GETk.rstShow resolvedHide resolved
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
bedevere-bot commentedJan 8, 2026
🤖 New build scheduled with the buildbot fleet by@cmaloney for commit92a5fff 🤖 Results will be shown at: https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F140559%2Fmerge If you want to schedule another build, you need to add the🔨 test-with-buildbots label again. |
Lib/test/test_buffer.py Outdated
| def test_array_alignment(self): | ||
| # gh-140557: pointer alignment of buffers including empty allocation | ||
| # should be at least to `size_t`. | ||
| align = struct.calcsize("N") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Chatted with a Rust export on the "Rust for CPython Discord". Came to the conclusion with them that we should alignarray.array to 8 bytes since it can store 8 byte things (typecodes: "qQd",https://docs.python.org/3/library/array.html#array.array). On 32 bit platformssize_t will only require 4 byte alignment.
Rather thanstruct.calcsize("N") I think we can just use the constant 8 here + have a comment pointing to the item size table in thearray.arraydoc?
joshtriplettJan 8, 2026 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Chiming in to confirm that we had this discussion. 👍 for aligning to 8 bytes on all platforms; it seems simpler and less surprising than trying to marginally optimize for thesubset of 4-byte platforms wherelong long anddouble are 4-byte aligned. Might also have benefits for other purposes, such as SIMD algorithms.
Perhttps://en.wikipedia.org/wiki/Data_structure_alignment:
when compiling for 32-bit x86:
[...]
A double (eight bytes) will be 8-byte aligned on Windows and 4-byte aligned on Linux (8-byte with -malign-double compile time option).
A long long (eight bytes) will be 8-byte aligned on Windows and 4-byte aligned on Linux (8-byte with -malign-double compile time option).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Oh yeah thanks - I was overly focused on how the buffer insidePyBytes ends up aligned and not thinking properly. I changed the test ofarray.array to take the max item size from all available formats for the platform, though I can change it to be hard-coded 8 if that's preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Max ofarray.array's elements I like / means both new element adding or buffer allocation changes would result in the test needing an intentional update / helps ensure changes are intentional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I haven't fully paged in to this (awesome!) workstream yet, but I wanted to note that I'mincredibly inspired by the unstable rustcore::simd feature, and I maintain both thehttps://docs rs/re2 andhttps://docs.rs/vectrorscan-async (hyperscan) rust wrapper crates.
With regards to SIMD, I have two ideas:
- Reflecting the FFI constraints into SIMD-compatible memory for rust
core::simdcould be a great investigation for rust's external types proposal:Tracking issue for RFC 1861: Extern types rust-lang/rust#43467 (comment) - I have begun work that searches for SIMD instructions in the cpython configure script:https://discuss.python.org/t/add-zero-copy-conversion-of-bytearray-to-bytes-by-providing-bytes/79164/70
- I would be very interested to discuss this sort of thing further. In that post I mention two specific use cases (one of them being url quoting in pip) which would produce very noticeable real-world speedups in foundational python tooling like pip.
I'm also right now doing a cleanroom implementation of zstd in rust (I can explain my concerns with ruzstd) and making use of alignment for SIMD to improve over the zstd C implementation (currently working on a ring buffer component), which I very much intend to plug into a python native module.
This comment is in full support of this PR and this workstream. Please let me know if I can follow up and learn more about this! Thanks! ^_^
jakelishman commentedJan 13, 2026
I think I've addressed all the extant comments in this PR, by the way. Happy to make more changes if there's more needed to land it. |
cosmicexplorer commentedJan 15, 2026
very non-blocking nitpick@jakelishman re#140559 (comment):
I have been handling this precise situation in my rust code with Is that approach not applicable here? Am I missing some reason why we know I would love to address this in a followup change if not! Thanks! |
bedevere-bot commentedJan 15, 2026
🤖 New build scheduled with the buildbot fleet by@encukou for commitd3ca57f 🤖 Results will be shown at: https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F140559%2Fmerge If you want to schedule another build, you need to add the🔨 test-with-buildbots label again. |
vstinner commentedJan 16, 2026
A single buildbot failed: test_urllib2 and test_urllibnet failed on buildbot/AMD64 Android PR, unrelated error. |
vstinner left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
LGTM
19de10d intopython:mainUh oh!
There was an error while loading.Please reload this page.
cmaloney commentedJan 26, 2026
Merged, thanks for the PR@jakelishman ! |
Uh oh!
There was an error while loading.Please reload this page.
This ensures the buffers used by the empty
bytearrayandarray.arrayare aligned the same as a pointer returned by the allocator. This is a more convenient default for interop with other languages that have stricter requirements of type-safe buffers (e.g. Rust's&[T]type) even when empty.I tried to do the same for
bytes, but I think its default buffer is only forcibly aligned on an 8 because of theuint64_tmember inPyBytesObject, and it ends up dependent on wherebytes_emptygets laid out. If that's desirable too, I might need some help figuring out a strategy for it.I'm not sure where's appropriate to put a test for this, or if it can/should be documented as reliable.
Issue:#140557
bytearrayandarray.array#140557