Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

gh-140557: Force alignment of emptybytearray andarray.array buffers#140559

Merged
encukou merged 11 commits intopython:mainfrom
jakelishman:max-align-buffers
Jan 26, 2026
Merged

gh-140557: Force alignment of emptybytearray andarray.array buffers#140559
encukou merged 11 commits intopython:mainfrom
jakelishman:max-align-buffers

Conversation

@jakelishman
Copy link
Contributor

@jakelishmanjakelishman commentedOct 24, 2025
edited by bedevere-appbot
Loading

This ensures the buffers used by the emptybytearray andarray.array are aligned the same as a pointer returned by the allocator. This is a more convenient default for interop with other languages that have stricter requirements of type-safe buffers (e.g. Rust's&[T] type) even when empty.

I tried to do the same forbytes, but I think its default buffer is only forcibly aligned on an 8 because of theuint64_t member inPyBytesObject, and it ends up dependent on wherebytes_empty gets laid out. If that's desirable too, I might need some help figuring out a strategy for it.

I'm not sure where's appropriate to put a test for this, or if it can/should be documented as reliable.

Issue:#140557

This ensures the buffers used by the empty `bytearray` and `array.array`are aligned the same as a pointer returned by the allocator.  This is amore convenient default for interop with other languages that havestricter requirements of type-safe buffers (e.g. Rust's `&[T]` type)even when empty.
@python-cla-bot
Copy link

python-cla-botbot commentedOct 24, 2025
edited
Loading

All commit authors signed the Contributor License Agreement.

CLA signed

Copy link
Member

@serhiy-storchakaserhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM. 👍

@cmaloney
Copy link
Contributor

cmaloney commentedOct 26, 2025
edited
Loading

I'm 👎 on this forbytearray; I've been looking at making the emptybytearray point to an emptybytes object where this wouldn't hold true.

see:gh-139871 for ways having abytes inside can make things faster (less copies of data often)

@cmaloney
Copy link
Contributor

re:bytearray, it also supports a "fast" start-delete which moves the start offset (ob_start) inside the allocated space by an arbitrary count of bytes which, to me, implies that unaligned data access from other languages needs to be supported in accessing/manipulating its internal buffers/data.

@jakelishman
Copy link
ContributorAuthor

jakelishman commentedOct 26, 2025
edited
Loading

This doesn't enforce that every buffer always has aligned access - you can easily take out a view of a bytes or bytearray that you offset by a byte - so other languages have to still handle the case of an unaligned pointer. It just makes the default empty object aligned at zero cost, and that object is common.

The emptybytes internal pointer still ends up aligned on an 8, which would in practice still make the pointer aligned formost data types you might be casting to, so swapping to that would still be an improvement over the status quo.

@jakelishman
Copy link
ContributorAuthor

We do still have to handle unaligned access everywhere, I agree there's no escaping it. The goal here is only around making unaligned objects less common -bytearray being aligned on a 1 (and actually turning up on a 1) is a lot more commonly visible now in Python 3.14 that pickle 5 uses it in the data stream to representPickleBuffer in band, so more libraries (e.g. Numpy) want to zero-copt view onto the recreated buffer.

@cmaloney
Copy link
Contributor

I'm still concerned here:

Botharray andbytearray externalize their item storage. ThePyObject* that is thearray object (orbytearray) is a separate allocation from the storage buffer for the elements. As an optimization, CPython doesn't allocate space if the length is 0, and these two values are used to handle a couple otherwise hard to handle return cases. Thearray one is only used inarray_buffer_getbuf so thatmemoryview() / buffer protocol works on anarray with no storage allocated itself. Thebytearray one shows up in the C APIPyByteArray_AS_STRING only whenbytearray has no internal storage.

  1. These are just placeholders to fill corner cases which seem like they shouldn't be common in code (to use the returned pointer without doing out of bounds access you'd need to look at thelen() / size).
  2. This adds a new CPython "guarantee" that it sounds like want to depend on in code but no test that would break if changes effect it. It should be possible to write one for alignment withhttps://docs.python.org/3/library/stdtypes.html#memoryview
  3. The default "buffer" forbytearray (andarray.array) are both immutable and shouldn't really be indexed into, read from, or written to; they're 0 bytes long, code can't set / modify data in them, and there is no data to read from them.

@jakelishman
Copy link
ContributorAuthor

I'm not aiming to make this a guarantee at all, just that by default the zero allocation case is already aligned. In Rust and other languages we do still have to handle unaligned buffers, just like Numpy alread has to in pure C.

The default "buffer" for bytearray (and array.array) are both immutable and shouldn't really be indexed into, read from, or written to; they're 0 bytes long, code can't set / modify data in them, and there is no data to read from them.

Right, this is where other languages have stronger guarantees - I mentioned it ingh-140557 as the motivation that in Rust, creating a&[T] primitive slice requires an aligned non-null pointer even if it's invalid for any reads. We can't convert every buffer to a slice with zero copies, so we already have alternative handling, but this patch makes the emptybytearray object go down a common path rather than a colder path by default. For similar fast-path alignment reasons, Numpy internally sets its "aligned" flag on empty buffers even if the pointer isn't actually aligned for the data type it purportedly points to, to avoid triggering copying code / extra handling in ufuncs that require alignment.

encukou reacted with thumbs up emoji

@jakelishman
Copy link
ContributorAuthor

About rarity of appearance: both of these pointers show through the buffer protocol like you mentioned, and that's where I come across them in language interop - that's the defined way to get zero-copy access to a data buffer owned by Python space.

We can't have zero-copy access to misaligned buffers, but in practice, the vast majority of buffers thatare back by an allocation end up aligned anyway, so requiring copies is rare (since you have to deliberately offset an allocated pointer by a sub-unit amount). We can add special handling to produce an empty slice in Rust that doesn't refer to the same pointer we actually get from the Python buffer protocol, if it's misaligned as an extra optimisation to avoid a copy, so we don'trequire CPython support, but if the default is actually aligned, then there's less point propagating this hypothetical optimising special-handling code through a lot of downstream packages and just using the slower "copy to force alignment" paths that (should) already exist for it.

When I wrote this patch, it was zero cost to CPython to achieve that for the defaults. If you have additional work that would seriously raise the cost to CPython then the calculus is different, though even reliably having the empty buffer aligned on an 8, like the emptybytes buffer is in practice, would be enough for the majority of cases I care about.

@cmaloney
Copy link
Contributor

Can you link to the code or provide a rust sample which needs to special-case handling a zero-length bytearray? I think that would help me understand here.

@serhiy-storchaka
Copy link
Member

It is not guaranteed that the start of the bytearray buffer has some alignment. This is a CPython implementation detail. But some code depends on this, and it may not work on non-x86 platforms if this is not aligned. There are exceptions: if there was a deletion from the beginning of the bytearray (we cannot do anything with this, but the user can guarantee that this did not happen), and when the bytearray is empty. The latter case is easy to fix for us, it costs nothing.

@jakelishman
Copy link
ContributorAuthor

cmaloney: Let's say I've got FFI got that wraps thePy_buffer interface, first by making (not precise - I'm just including the illustrative stuff):

structPyBuffer{buf:*mut(),itemsize:isize,ndim:::std::ffi::c_int,shape:*constisize,strides:*constisize,}implPyBuffer{/// Is the slice contiguous in memory?fnis_contiguous(&self) ->bool{/* ... */}/// How many bytes can be read from it?fnlen_bytes(&self) ->usize{/* ... */}}

Let's say I've got one of these structs that I then initialised withPyObject_GetBuffer, and now I want to expose a Rust-native slice view for a specific Rust type, which might not match the "native" type of the buffer, since I might have been given an array created from abytearray object whoseitemsize is 1 but actually represents storage ofuint64_t (u64 in Rust). I return anResult here to signify to the caller that the function is fallible1:

enumSliceError{Unaligned,Noncontiguous,Nullptr,}fnslice_from_buffer<T>(buf:&PyBuffer) ->Result<&[T],SliceError>{if buf.buf.is_null(){returnErr(SliceError::Nullptr);}if !buf.is_contiguous(){returnErr(SliceError::Noncontiguous);}if !buf.buf.is_aligned::<T>(){returnErr(SliceError::Unaligned);}// SAFETY: pointer is non-null, aligned, and valid for this many contiguous reads:Ok(unsafe{ std::slice::from_raw_parts(    buf.buf,    buf.len_bytes() / std::mem::size_of::<T>()})}

I have to do the alignment and contiguous checks before I callstd::slice::from_raw_parts, because it's undefined behaviour in Rust to create a slice backed by an unaligned or null pointer, even if the length is zero. (The reason is to enable specific type-niche optimisations in the compiler to save space in compound types that contain&[T].)

A Rust caller of this function is still responsible for handling the case of an unaligned pointer, which might cause them to do something like

matchslice_from_buffer::<u64>(&buf){Ok(slice) =>use_slice(slice),Err(SliceError::Noncontiguous |SliceError::Unaligned) =>{let aligned_contiguous =/* copy buffer to somewhere aligned */;use_slice(aligned_contiguous.as_slice())}Err(SliceError::Nullptr) =>panic!(),}

The idea of this PR is just to make it so that slightly more stuff by default gets to go down the happy path, in a way that doesn't cost CPython anything. It's still the Rust user's responsibility to handle the unhappy path since that's totally valid Python code still, and the Rust library's responsibility to make sure everything is safe for FFI use in Rust2. This PR isn't intending to add any restrictions on what CPython is allowed to do or what other Python implementations may do.


Your#139871 looks to me to also achieve the same goal I was going for here, just as a side effect (since the emptybytes buffer happens to be 8-byte aligned in CPython), so if that merged, it'd improve thebytearray situation as well. This PR isslightly stronger forbytearray, but that's largely immaterial (it most likely only affects SIMD types).

Footnotes

  1. this code is usually in Python interface libraries likePyO3, whose implementation is here, but I wrote the example manually because PyO3's has additional complications and spreads the required checks into a few separate places. Also, the same sort of code comes up in lots of specialised places too, likerust-numpy that provides FFI access to Numpy arrays, etc.

  2. there are safe Rust-side optimisations that can be done around empty slices (which I'm planning to contribute in several Rust projects) such as creating a slice out of a dangling pointer magicked from thin air, which helps these cases as well,

cmaloney reacted with thumbs up emoji

@cmaloney
Copy link
Contributor

cmaloney commentedOct 27, 2025
edited
Loading

I'm 👍 for doingarray here as it supports larger than 1-byte element objects. Does still need a test so if the code is broken/removed accidentally CPython devs will notice. That should be implementable withhttps://docs.python.org/3/library/ctypes.html#ctypes.alignment.

bytearray should be covered by my CPython internals change which will help your case. For bytes-like and slicing in general across rust and other languages I foundhttps://davidben.net/2024/01/15/empty-slices.html really helpful to improve my knowledge. I don't think it makes sense for CPython to adopt the rust semantic for its "bytes-like" generally; and "buffer protocol" /memoryview across the ecosystem is based on C "bytes" for better and worse.

cmaloney
cmaloney previously requested changesOct 29, 2025
Copy link
Contributor

@cmaloneycmaloney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Reduce scope to justarray.array + add a test; empty buffer + non-emptyarray.array storage being aligned I think is a nice improvement

@serhiy-storchaka
Copy link
Member

I do not think that bytearray should be excluded. If not a special optimization of using NULL for empty array, it would have standard alignment (as returned bymalloc()). The fact that currently it can have worse alignment is the result of optimization. Optimization should not cause regression.

@cmaloney
Copy link
Contributor

re:bytearraygh-140557 /GH-140128 makes it aligned as a side effect becausebytes is aligned (and removes usage of _PyByteArray_empty_string; ob_start is always set)

@serhiy-storchaka
Copy link
Member

Then I do not see reasons to object this change.

@cmaloney
Copy link
Contributor

  1. I don't like having implementation details relied on by external code/systems without a test / validation. To me that creates a potential future release blocker if the assertion gets broken. I don't see adding a test here as a really onerous requirement.
  2. Aligning_PyByteArray_empty_string is going to be a no op / just waste space shortly, so isn't a lot of value in doing in this PR as this will just show up in 3.15

I have slight nit in wording of the NEWS entry; I think it would be good to mention Rust and to make it more concise.

@serhiy-storchaka
Copy link
Member

I think there is a good chance that the code that relies on this already exist. It works only because the Intel plathform is tolerable to unaligned access (and there is no actually access to memory here, because it is an empty buffer). But on other platforms pointers of different type can use different registers.

This is not limited to Rust. In C, casting a pointer with wrong alignment can be an undefined behavior. If bytesarray is used instead of array as a collection of 16-, 32- or 64-bit integers, there may be problems.

Copy link
Member

@serhiy-storchakaserhiy-storchaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think the cost of the tests exceeds their value. They are complex and fragile.

If you insist on adding tests, they can be implemented by adding a simple helper in_testcapi. The tests should be decorated as CPython-only. They should also test non-empty arrays of different types.

@jakelishman
Copy link
ContributorAuthor

Thanks for the ping, and happy holidays everybody. I reverted the old fragile tests and added (my best attempt at) the new_testcapi-based versions. I'm back at work properly now after the break, so should be more responsive again.

cmaloney reacted with heart emoji

@jakelishman
Copy link
ContributorAuthor

Ah, the test failure here is because the handling of thebytearray type aligns the empty array on 8s, likebytes does, whereas on some platforms, the max align is 16.

Shall I attempt to modify the basebytes/bytearray object to be aligned on the max alignment, or only commit as far as aligning on 8s?

@jakelishman
Copy link
ContributorAuthor

I don't think it's practical/possible to force alignment of the emptybytes buffer up to 16s - doing so would require the offset of.ob_sval inPyBytesObject to shift by 8 bytes on 16-byteMAX_ALIGN_T systems, which would just be wasted memory in the vast majority of cases, andbytesallocations already weren't reliably onMAX_ALIGN_T for the same reason, so there's no "optimisations shouldn't be observable" argument to make there.

I've changed the test case to explicit check forsize_t compatible alignment only, but happy to go a different approach if there's a preferred one.

cmaloney reacted with thumbs up emoji

@cmaloneycmaloney dismissed theirstale reviewJanuary 7, 2026 06:28

stale; new state addresses core concerns

@cmaloneycmaloney added the 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section labelJan 8, 2026
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by@cmaloney for commit92a5fff 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F140559%2Fmerge

If you want to schedule another build, you need to add the🔨 test-with-buildbots label again.

@bedevere-botbedevere-bot removed the 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section labelJan 8, 2026
def test_array_alignment(self):
# gh-140557: pointer alignment of buffers including empty allocation
# should be at least to `size_t`.
align = struct.calcsize("N")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Chatted with a Rust export on the "Rust for CPython Discord". Came to the conclusion with them that we should alignarray.array to 8 bytes since it can store 8 byte things (typecodes: "qQd",https://docs.python.org/3/library/array.html#array.array). On 32 bit platformssize_t will only require 4 byte alignment.

Rather thanstruct.calcsize("N") I think we can just use the constant 8 here + have a comment pointing to the item size table in thearray.arraydoc?

jakelishman reacted with thumbs up emoji
Copy link

@joshtriplettjoshtriplettJan 8, 2026
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Chiming in to confirm that we had this discussion. 👍 for aligning to 8 bytes on all platforms; it seems simpler and less surprising than trying to marginally optimize for thesubset of 4-byte platforms wherelong long anddouble are 4-byte aligned. Might also have benefits for other purposes, such as SIMD algorithms.

Perhttps://en.wikipedia.org/wiki/Data_structure_alignment:

when compiling for 32-bit x86:
[...]
A double (eight bytes) will be 8-byte aligned on Windows and 4-byte aligned on Linux (8-byte with -malign-double compile time option).
A long long (eight bytes) will be 8-byte aligned on Windows and 4-byte aligned on Linux (8-byte with -malign-double compile time option).

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Oh yeah thanks - I was overly focused on how the buffer insidePyBytes ends up aligned and not thinking properly. I changed the test ofarray.array to take the max item size from all available formats for the platform, though I can change it to be hard-coded 8 if that's preferred.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Max ofarray.array's elements I like / means both new element adding or buffer allocation changes would result in the test needing an intentional update / helps ensure changes are intentional

jakelishman and cosmicexplorer reacted with thumbs up emoji

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I haven't fully paged in to this (awesome!) workstream yet, but I wanted to note that I'mincredibly inspired by the unstable rustcore::simd feature, and I maintain both thehttps://docs rs/re2 andhttps://docs.rs/vectrorscan-async (hyperscan) rust wrapper crates.

With regards to SIMD, I have two ideas:

  1. Reflecting the FFI constraints into SIMD-compatible memory for rustcore::simd could be a great investigation for rust's external types proposal:Tracking issue for RFC 1861: Extern types rust-lang/rust#43467 (comment)
  2. I have begun work that searches for SIMD instructions in the cpython configure script:https://discuss.python.org/t/add-zero-copy-conversion-of-bytearray-to-bytes-by-providing-bytes/79164/70
    • I would be very interested to discuss this sort of thing further. In that post I mention two specific use cases (one of them being url quoting in pip) which would produce very noticeable real-world speedups in foundational python tooling like pip.

I'm also right now doing a cleanroom implementation of zstd in rust (I can explain my concerns with ruzstd) and making use of alignment for SIMD to improve over the zstd C implementation (currently working on a ring buffer component), which I very much intend to plug into a python native module.

This comment is in full support of this PR and this workstream. Please let me know if I can follow up and learn more about this! Thanks! ^_^

@jakelishman
Copy link
ContributorAuthor

I think I've addressed all the extant comments in this PR, by the way. Happy to make more changes if there's more needed to land it.

cmaloney and cosmicexplorer reacted with thumbs up emojicmaloney and cosmicexplorer reacted with heart emoji

@cosmicexplorer
Copy link

very non-blocking nitpick@jakelishman re#140559 (comment):

I have to do the alignment and contiguous checks before I callstd::slice::from_raw_parts, because it's undefined behaviour in Rust to create a slice backed by an unaligned or null pointer, even if the length is zero. (The reason is to enable specific type-niche optimisations in the compiler to save space in compound types that contain&[T].)

I have been handling this precise situation in my rust code withslice::align_to{,_mut}() instead of erroring out. That method is implemented usingpointer::align_offset() and then aconst! { ... } evaluation of the gcd in the stdlib:https://github.com/rust-lang/rust/blob/a6acf0f07f0ed1c12e26dc0db3b9bf1d0504a0bb/library/core/src/slice/mod.rs#L4220

Is that approach not applicable here? Am I missing some reason why we knowpointer::align_offset() would be out of range, for example?

I would love to address this in a followup change if not! Thanks!

@encukouencukou added the 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section labelJan 15, 2026
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by@encukou for commitd3ca57f 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F140559%2Fmerge

If you want to schedule another build, you need to add the🔨 test-with-buildbots label again.

@bedevere-botbedevere-bot removed the 🔨 test-with-buildbotsTest PR w/ buildbots; report in status section labelJan 15, 2026
@vstinner
Copy link
Member

A single buildbot failed: test_urllib2 and test_urllibnet failed on buildbot/AMD64 Android PR, unrelated error.

Copy link
Member

@vstinnervstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM

@encukouencukou merged commit19de10d intopython:mainJan 26, 2026
117 of 118 checks passed
@jakelishmanjakelishman deleted the max-align-buffers branchJanuary 26, 2026 16:06
@cmaloney
Copy link
Contributor

Merged, thanks for the PR@jakelishman !

jakelishman reacted with heart emoji

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

@gpsheadgpsheadgpshead left review comments

@cmaloneycmaloneycmaloney approved these changes

@serhiy-storchakaserhiy-storchakaserhiy-storchaka approved these changes

@vstinnervstinnervstinner approved these changes

@encukouencukouAwaiting requested review from encukou

+2 more reviewers

@joshtriplettjoshtriplettjoshtriplett left review comments

@cosmicexplorercosmicexplorercosmicexplorer left review comments

Reviewers whose approvals may not affect merge requirements

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

9 participants

@jakelishman@cmaloney@serhiy-storchaka@bedevere-bot@cosmicexplorer@vstinner@gpshead@joshtriplett@encukou

[8]ページ先頭

©2009-2026 Movatter.jp