- Notifications
You must be signed in to change notification settings - Fork587
Faster string reversal when source/dest buffers are distinct#23012
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
Note: I have a separate branch tinkering with (1) minor variations on the current algorithms (2) the portable code mentioned in#20692. Likely there will be a subsequent PR for at least the byte-reversal algorithm. I don't know how to significantly improve the UTF8 algorithms - any suggestions welcome! |
f1b0c04
to54f0528
CompareI added a branch for the single byte UFT8 case while squashing bugs.
Compiling withclang-11, the "before" times are both about the same as with
|
thesamesam commentedFeb 23, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Please do consider filing a bug with preprocessed sources to GCC for that (even better with a simple testcase; just a function or function(s) which clearly don't vectorise well compared to Clang in godbolt is fine. no need for it to be benchmarkable). Thanks. |
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
54f0528
to1fbc4c4
Compare
https://godbolt.org/z/jz6fxKKhh probably captures this. A simple
I got the impression that alias checking under gcc is a known problem and is covered by a metabug, but I can't find the link to it right now. I didn't fancy wading though all the linked issues to check for duplicates at this time. Feel free to report this case though if you believe it to be useful to the gcc folks. I've got a follow-up commit planned that would give gcc and clang similar performance for string reversal in Perl. |
|
I gather that building for |
If those buffers are distinct, the |
|
It does in GCC trunk, and indeed, according to WIPGCC 15 release notes, it is supposed to vectorize more aggressively at -O2. |
I had that in originally, but it seemed to break g++ builds? |
thesamesam commentedMar 3, 2025
It should be okay to add it conditionally: either a |
Given where we are in the dev cycle, are there any objections to me merging this PR as-is to at least get this level of improvement in to v42? (Subject to any other review comments you have on it.) It seems like there are a bunch of things to look at and doing so - including smoking time - could run us into the hard freeze.
|
unaligned access is undefined behaviour from C, UBSAN will complain about it if nothing else. Under the hood the compiler can generate code that relies on the platform or special instructions that support unaligned access, just as it does when clang/gcc vectorizes the simple loop. |
pp.c Outdated
if (DO_UTF8(src_sv)) { | ||
SvUTF8_on(TARG); | ||
const U8* s = (U8*)src; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
This would be better as:
const U8* s = (const U8*)src;
which avoids casting away const.
Similarly below forsend
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thanks, I'll make those two changes.
The main thing I'm hesitant about with this change is hoisting the SvPV_can_swipe_buf() out into a macro, and one where three of the parameters are simple fields of one of the other parameters. I think it could be a simple function in sv.c taking only the ssv and flags parameters. If this is defined before Perl_sv_setsv_flags it may even get inlined there. (other opinions welcome here) |
Without specifying any explicit optimization settings, gcc does not inline it, but clang does. Feeding I suppose the check could be pulled out into a macro in sv.c, then used by both |
Works for me |
…_flagsPerl_sv_setsv_flags contains the canonical logic for determining the bestmethod for assigning the string value from a source SV to a destination SV: * "Swipe" the string buffer from the source SV * COW the source SV's string buffer * Do a full copyThis commit extracts the "can the swipe the buffer" tests out into a newmacro (`S_SvPV_can_swipe_buf`) within sv.c. It has two users: * Perl_sv_setsv_flags - so that the logic remains inline in this hot code * Perl_sv_can_can_swipe_pv_buf - a new function`pp_reverse` will shortly make use of the new function to avoid unnecessarystring copies when doing a reversal in scalar context.
This could help detect implementation bugs in algorithms for reversecopying from a source buffer to a different destination buffer.
1fbc4c4
to532833c
Compare`pp_reverse()` has traditionally operated on a single string buffer,reversing it in place.When `pp_reverse()` takes a single SV argument, whose buffer cannot beswiped by `sv_setsv_flags()`, a straight copy of the buffer is takenand then that is reversed. In such cases, two complete passes over theentire string are needed to reverse a non-UTF8 string, and reversing aUTF8 string takes three complete passes.This commit enables `pp_reverse()` to apply the "swipe" test itselfand, for straightforward copy cases, to avoid calling `sv_setsv_flags()`.Instead, it does the work of preparing the `TARG` SV and reverse copiesthe string (UTF8 or otherwise) in a single pass.The performance improvement will vary by compiler and CPU. With gcc 12.2.0on Linux running on "znver3" hardware, performance for both UTF8 andnon-UTF8 strings approximately doubled. Using clang-11 instead on the samemachine gave a fivefold improvement for the non-UTF8 case.
532833c
tod8e448f
Compare
Now implemented like that. |
pp_reverse()
has traditionally operated on a single string buffer,reversing it in place.
When
pp_reverse()
takes a single SV argument, whose buffer cannot beswiped by
sv_setsv_flags()
, a straight copy of the buffer is takenand then that is reversed. In such cases, two complete passes over the
entire string are needed to reverse a non-UTF8 string, and reversing a
UTF8 string takes three complete passes.
This PR enables
pp_reverse()
to apply the "swipe" test itselfand, for straightforward copy cases, to avoid calling
sv_setsv_flags()
.Instead, it does the work of preparing the
TARG
SV and reverse copiesthe string (UTF8 or otherwise) in a single pass.
Non-UTF8 performance is more than doubled. UTF8 performance is improved
by ~40%. (This is using
gcc version 12.2.0
on Linux running on a midrange Zen3 CPU.)my $x = "X"x(1024*1000*10); my $y; for (0..1_000) { $y = reverse $x }
PATCHED
BLEAD
my $x = "\x{263A}\x{263A}x\x{263A}y\x{263A}"x(1024*1000); my $y; for (0..1_000) { $y = reverse $x }
PATCHED
BLEAD