- Notifications
You must be signed in to change notification settings - Fork1.2k
fix: add parse_string_if_needed function#2212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:master
Are you sure you want to change the base?
Uh oh!
There was an error while loading.Please reload this page.
Conversation
CarlosEduR commentedJul 12, 2024 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
This is an excellent PR,@lemire! I am using a AMD64, I'll run the benckmarks locally. |
lemire commentedJul 12, 2024
It is not super difficult, thankfully... but I am somewhat disappointed so far that we don't see much of an effect.
That would be great. Please consider optimizing your own code (removing the |
CarlosEduR commentedJul 16, 2024
partial_tweets<simdjson_ondemand>Daniel's PR: Master Branch: Carlos' PR find_tweet<simdjson_ondemand>Master branch: Daniel's PR: Carlos' PR: I've not updated my code yet (removing the != '\0'), will do it and will share results. |
lemire commentedSep 18, 2024
Did you get around to it? |
CarlosEduR commentedSep 25, 2024
yes! |
lemire commentedSep 25, 2024
Hmmm, did the number of instructions go up in your find_tweet benchmark? Just so we are clear, the idea was to reduce the amount of unnecessary work in your PR (check for terminating null). This should improve the performance and reduce the instruction count? |
Uh oh!
There was an error while loading.Please reload this page.
This is a competing PR vs#2211 by@CarlosEduR
The idea is that we avoid copying strings to a string buffer when we do not need to.
@CarlosEduR uses a sensible approach that does not require 'deep' changes. This PR is somewhat deeper. Whether this PR is better than#2211 is an open question.
Note that@CarlosEduR checks for the null termination which is unnecessary work. Removing this check could change the story.
partial fix to#1470 Note that@jkeiser's idea is somewhat more involved than what we are doing currently.
Benchmarks on my ARM processor (Apple M2).
partial tweek
In this benchmark, we typically do not have escaped content in strings, but it does happen from time to time. I don't have an exact percentage but it is maybe 20% or 10% of the time that we have escaped content.
Using
./build/benchmark/bench_ondemand --benchmark_filter="partial_tweets<simdjson_ondemand>". Run with sudo to get performance counters.Main:
best_instructions_per_byte=8.68026 best_instructions_per_cycle=6.57819
@CarlosEduR's PR:
best_instructions_per_byte=9.31405 best_instructions_per_cycle=6.54882
This PR:
best_instructions_per_byte=8.66243 best_instructions_per_cycle=6.5704
find tweet
This is a lucky benchmark where we never have escaped content to worry about.
Using
./build/benchmark/bench_ondemand --benchmark_filter="find_tweet<simdjson_ondemand>". Run with sudo to get performance counters.Main
best_instructions_per_byte=4.71785 best_instructions_per_cycle=6.34339
@CarlosEduR's PR:
best_instructions_per_byte=4.7256 best_instructions_per_cycle=6.34974
This PR:
best_instructions_per_byte=4.71761 best_instructions_per_cycle=6.34714
Conclusion
It is too early to tell which direction this goes because (1) I only tested on one system and (2) only on two benchmarks.
ARM systems do not have to contend with runtime dispatches, so this is an advantage for this PR, compared to@CarlosEduR's PR. However,@CarlosEduR's PR could do better when runtime dispatching is needed.
@CarlosEduR's PR shows a regression which is possibly caused by the fact that it tries to avoid the copy, fails and then has to fall back on the current code. Even if it only happens one time out of 5 or 10, these unlucky cases could cost you.
@CarlosEduR's PR could be further optimized and the story might change.
Overall, my preliminary results suggest that on Apple Silicon, it is not worth avoiding a write on the string buffer.