Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

fix: add parse_string_if_needed function#2212

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
lemire wants to merge2 commits intomaster
base:master
Choose a base branch
Loading
fromparse_string_if_needed

Conversation

@lemire
Copy link
Member

@lemirelemire commentedJul 11, 2024
edited
Loading

This is a competing PR vs#2211 by@CarlosEduR

The idea is that we avoid copying strings to a string buffer when we do not need to.

@CarlosEduR uses a sensible approach that does not require 'deep' changes. This PR is somewhat deeper. Whether this PR is better than#2211 is an open question.

Note that@CarlosEduR checks for the null termination which is unnecessary work. Removing this check could change the story.

partial fix to#1470 Note that@jkeiser's idea is somewhat more involved than what we are doing currently.

Benchmarks on my ARM processor (Apple M2).

partial tweek

In this benchmark, we typically do not have escaped content in strings, but it does happen from time to time. I don't have an exact percentage but it is maybe 20% or 10% of the time that we have escaped content.

Using ./build/benchmark/bench_ondemand --benchmark_filter="partial_tweets<simdjson_ondemand>". Run with sudo to get performance counters.

Main:
best_instructions_per_byte=8.68026 best_instructions_per_cycle=6.57819

@CarlosEduR's PR:
best_instructions_per_byte=9.31405 best_instructions_per_cycle=6.54882

This PR:
best_instructions_per_byte=8.66243 best_instructions_per_cycle=6.5704

find tweet

This is a lucky benchmark where we never have escaped content to worry about.

Using ./build/benchmark/bench_ondemand --benchmark_filter="find_tweet<simdjson_ondemand>". Run with sudo to get performance counters.

Main
best_instructions_per_byte=4.71785 best_instructions_per_cycle=6.34339

@CarlosEduR's PR:
best_instructions_per_byte=4.7256 best_instructions_per_cycle=6.34974

This PR:
best_instructions_per_byte=4.71761 best_instructions_per_cycle=6.34714

Conclusion

It is too early to tell which direction this goes because (1) I only tested on one system and (2) only on two benchmarks.

ARM systems do not have to contend with runtime dispatches, so this is an advantage for this PR, compared to@CarlosEduR's PR. However,@CarlosEduR's PR could do better when runtime dispatching is needed.

@CarlosEduR's PR shows a regression which is possibly caused by the fact that it tries to avoid the copy, fails and then has to fall back on the current code. Even if it only happens one time out of 5 or 10, these unlucky cases could cost you.

@CarlosEduR's PR could be further optimized and the story might change.

Overall, my preliminary results suggest that on Apple Silicon, it is not worth avoiding a write on the string buffer.

CarlosEduR and Cirnoo reacted with heart emoji
@CarlosEduR
Copy link
Member

CarlosEduR commentedJul 12, 2024
edited
Loading

This is an excellent PR,@lemire!
I appreciate the dedication, really nicejab job!

I am using a AMD64, I'll run the benckmarks locally.

@lemire
Copy link
MemberAuthor

I appreciate the dedication

It is not super difficult, thankfully... but I am somewhat disappointed so far that we don't see much of an effect.

I am using a AMD64, I'll run the benckmarks locally.

That would be great. Please consider optimizing your own code (removing the!= '\0') as this could change the story.

@CarlosEduR
Copy link
Member

partial_tweets<simdjson_ondemand>

Daniel's PR:
best_instructions_per_byte=3.31552 best_instructions_per_cycle=3.22185

Master Branch:
best_instructions_per_byte=3.30848 best_instructions_per_cycle=3.27537

Carlos' PR
best_instructions_per_byte=3.6775 best_instructions_per_cycle=3.27884

find_tweet<simdjson_ondemand>

Master branch:
best_instructions_per_byte=2.28167 best_instructions_per_cycle=3.24331

Daniel's PR:
best_instructions_per_byte=2.28163 best_instructions_per_cycle=3.23015

Carlos' PR:
best_instructions_per_byte=2.28611 best_instructions_per_cycle=3.1972

I've not updated my code yet (removing the != '\0'), will do it and will share results.

@lemire
Copy link
MemberAuthor

@CarlosEduR

I've not updated my code yet (removing the != '\0'), will do it and will share results.

Did you get around to it?

@CarlosEduR
Copy link
Member

Did you get around to it?

yes!
partial_tweets and find_tweet benchmarks:

best_instructions_per_byte=3.58215 best_instructions_per_cycle=3.15066best_instructions_per_byte=2.28551 best_instructions_per_cycle=3.25264

@lemire
Copy link
MemberAuthor

@CarlosEduR

Hmmm, did the number of instructions go up in your find_tweet benchmark?

Just so we are clear, the idea was to reduce the amount of unnecessary work in your PR (check for terminating null). This should improve the performance and reduce the instruction count?

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment

Reviewers

No reviews

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

3 participants

@lemire@CarlosEduR

[8]ページ先頭

©2009-2025 Movatter.jp