Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

⚡️ Speed up function_estimate_string_tokens by 221%#2156

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
misrasaurabh1 wants to merge5 commits intopydantic:main
base:main
Choose a base branch
Loading
frommisrasaurabh1:codeflash/optimize-_estimate_string_tokens-mcs8yg4q

Conversation

misrasaurabh1
Copy link
Contributor

📄 221% (2.21x) speedup for_estimate_string_tokens inpydantic_ai_slim/pydantic_ai/models/function.py

⏱️ Runtime :5.29 milliseconds1.65 milliseconds (best of89 runs)

📝 Explanation and details

Hotspots

  • Most time is spent callingre.split() for every string or string-like object, which is expensive.
  • Checkingisinstance for common types on each iteration.
  • tokens += 0 operations are no-ops and can be removed.
  • The regex can be precompiled.
  • str.split() withNone as delimiter is oftenmuch faster and covers most whitespace splitting, which is likely enough here.
  • content.strip() is being called redundantly for every string, which we can optimize.

Optimizations made.

  • Precompiled the regex, so it’s not recompiled per call.
  • Removed alltokens += 0 and unnecessary else branches (no effect).
  • Minimized calls to.strip() and.split() to once per string instance.
  • Dropped extraneousisinstance checks.
  • Moved logic into clear branches, optimizing the common paths.

Correctness verification report:

TestStatus
⚙️ Existing Unit Tests🔘None Found
🌀 Generated Regression Tests64 Passed
⏪ Replay Tests🔘None Found
🔎 Concolic Coverage Tests🔘None Found
📊 Tests Coverage85.7%
🌀 Generated Regression Tests and Runtime
importrefromcollections.abcimportSequence# importsimportpytest# used for our unit testsfrompydantic_ai.models.functionimport_estimate_string_tokens# Dummy classes to mimic pydantic_ai.messages for testingclassAudioUrl:def__init__(self,url):self.url=urlclassImageUrl:def__init__(self,url):self.url=urlclassBinaryContent:def__init__(self,data):self.data=datafrompydantic_ai.models.functionimport_estimate_string_tokens# unit tests# ------------------------ BASIC TEST CASES ------------------------deftest_empty_string_returns_zero():# Test that empty string returns 0 tokenscodeflash_output=_estimate_string_tokens("")# 349ns -> 378ns (7.67% slower)deftest_simple_sentence():# Test a simple sentencecodeflash_output=_estimate_string_tokens("Hello world")# 3.63μs -> 2.42μs (49.8% faster)deftest_sentence_with_punctuation():# Test sentence with punctuationcodeflash_output=_estimate_string_tokens("Hello, world.")# 3.28μs -> 2.45μs (33.9% faster)deftest_sentence_with_multiple_spaces():# Test sentence with multiple spacescodeflash_output=_estimate_string_tokens("Hello    world")# 3.57μs -> 2.39μs (49.4% faster)deftest_sentence_with_mixed_delimiters():# Test sentence with various delimiterscodeflash_output=_estimate_string_tokens('Hello, world: "Python".')# 3.83μs -> 2.73μs (40.3% faster)deftest_string_with_leading_and_trailing_spaces():# Leading/trailing whitespace should not affect token countcodeflash_output=_estimate_string_tokens("   Hello world   ")# 3.52μs -> 2.39μs (47.2% faster)deftest_string_with_only_delimiters():# Only delimiters should result in zero tokenscodeflash_output=_estimate_string_tokens(" ,.:  ")# 3.34μs -> 2.05μs (62.7% faster)deftest_string_with_newlines_and_tabs():# Newlines and tabs are whitespace and should be treated as delimiterscodeflash_output=_estimate_string_tokens("Hello\nworld\tPython")# 3.81μs -> 2.75μs (38.5% faster)# ------------------------ EDGE TEST CASES ------------------------deftest_none_input_returns_zero():# None input should return 0 tokenscodeflash_output=_estimate_string_tokens(None)# 338ns -> 366ns (7.65% slower)deftest_empty_list_returns_zero():# Empty list should return 0 tokenscodeflash_output=_estimate_string_tokens([])# 358ns -> 358ns (0.000% faster)deftest_list_of_empty_strings():# List of empty strings should return 0 tokenscodeflash_output=_estimate_string_tokens(["","",""])# 5.70μs -> 1.97μs (190% faster)deftest_list_of_strings():# List of strings should sum token countscodeflash_output=_estimate_string_tokens(["Hello world","Python is great"])# 6.52μs -> 3.66μs (78.0% faster)deftest_list_with_string_and_empty_string():# List with a string and an empty stringcodeflash_output=_estimate_string_tokens(["Hello world",""])# 6.68μs -> 3.06μs (119% faster)deftest_list_with_audio_and_image_url():# AudioUrl and ImageUrl should contribute 0 tokensaudio=AudioUrl("http://audio.url")image=ImageUrl("http://image.url")codeflash_output=_estimate_string_tokens([audio,image])# 2.66μs -> 1.19μs (123% faster)deftest_list_with_string_and_audio_url():# String and AudioUrl, only string countsaudio=AudioUrl("http://audio.url")codeflash_output=_estimate_string_tokens(["Hello world",audio])# 5.87μs -> 3.18μs (84.8% faster)deftest_list_with_binary_content():# BinaryContent's token count is len(data)binary=BinaryContent(b"abcde")codeflash_output=_estimate_string_tokens([binary])# 2.18μs -> 1.00μs (117% faster)deftest_list_with_string_and_binary_content():# Both string and BinaryContent contributebinary=BinaryContent(b"abcde")codeflash_output=_estimate_string_tokens(["Hello",binary])# 5.65μs -> 2.60μs (117% faster)deftest_list_with_all_types():# All types togetheraudio=AudioUrl("http://audio.url")image=ImageUrl("http://image.url")binary=BinaryContent(b"xyz")codeflash_output=_estimate_string_tokens(["Hi there",audio,image,binary,"Python."])# 7.98μs -> 3.91μs (104% faster)deftest_list_with_unexpected_type():# Unexpected type should contribute 0 tokensclassDummy:passcodeflash_output=_estimate_string_tokens(["Hello",Dummy()])deftest_string_with_only_spaces():# String of only spaces returns 0codeflash_output=_estimate_string_tokens("     ")# 2.44μs -> 1.40μs (74.8% faster)deftest_string_with_unicode_characters():# Unicode characters should be counted as tokenscodeflash_output=_estimate_string_tokens("你好 世界")# 5.61μs -> 4.50μs (24.6% faster)deftest_string_with_mixed_unicode_and_ascii():# Mixed unicode and asciicodeflash_output=_estimate_string_tokens("hello 世界")# 4.64μs -> 3.56μs (30.4% faster)deftest_list_with_unicode_strings():# List with unicode stringscodeflash_output=_estimate_string_tokens(["你好 世界","hello"])# 8.25μs -> 4.64μs (78.0% faster)deftest_binary_content_empty():# BinaryContent with empty databinary=BinaryContent(b"")codeflash_output=_estimate_string_tokens([binary])# 1.96μs -> 1.02μs (91.7% faster)deftest_list_with_nested_empty_lists():# Nested empty lists are not supported, but should not crashcodeflash_output=_estimate_string_tokens([[]])# 2.03μs -> 905ns (124% faster)# ------------------------ LARGE SCALE TEST CASES ------------------------deftest_long_string_1000_words():# Test a string with 1000 wordslong_str="word "*1000codeflash_output=_estimate_string_tokens(long_str.strip())# 121μs -> 111μs (8.44% faster)deftest_list_of_1000_strings():# List of 1000 single-word stringsstring_list= ["word"]*1000codeflash_output=_estimate_string_tokens(string_list)# 701μs -> 191μs (266% faster)deftest_list_of_1000_empty_strings():# List of 1000 empty stringsstring_list= [""]*1000codeflash_output=_estimate_string_tokens(string_list)# 596μs -> 89.6μs (566% faster)deftest_list_of_500_strings_and_500_binary():# 500 strings, 500 BinaryContent (each with 2 bytes)string_list= ["hi"]*500binary_list= [BinaryContent(b"ab")]*500codeflash_output=_estimate_string_tokens(string_list+binary_list)# 504μs -> 112μs (349% faster)deftest_list_of_1000_audio_and_image_urls():# 500 AudioUrl, 500 ImageUrlaudio_list= [AudioUrl("a")]*500image_list= [ImageUrl("b")]*500codeflash_output=_estimate_string_tokens(audio_list+image_list)# 278μs -> 45.4μs (513% faster)deftest_list_of_mixed_types_large():# 333 strings, 333 binary (3 bytes), 334 imagesstring_list= ["hello world"]*333# 2 tokens eachbinary_list= [BinaryContent(b"xyz")]*333# 3 tokens eachimage_list= [ImageUrl("img")]*334# 0 tokens eachexpected=333*2+333*3+334*0codeflash_output=_estimate_string_tokens(string_list+binary_list+image_list)# 512μs -> 150μs (242% faster)deftest_performance_large_string():# Test that large string doesn't crash or hang (not a strict performance test)long_str="a "*999+"a"codeflash_output=_estimate_string_tokens(long_str)# 66.2μs -> 66.3μs (0.202% slower)deftest_performance_large_list():# Test that large list doesn't crash or hangstring_list= ["a b c"]*333# 3 tokens eachcodeflash_output=_estimate_string_tokens(string_list)# 295μs -> 108μs (172% faster)# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.importrefromcollections.abcimportSequence# importsimportpytest# used for our unit testsfrompydantic_ai.models.functionimport_estimate_string_tokens# Dummy classes to mimic pydantic_ai.messagesclassAudioUrl:def__init__(self,url):self.url=urlclassImageUrl:def__init__(self,url):self.url=urlclassBinaryContent:def__init__(self,data:bytes):self.data=dataUserContent= (str,AudioUrl,ImageUrl,BinaryContent)frompydantic_ai.models.functionimport_estimate_string_tokens# unit tests# 1. Basic Test Casesdeftest_empty_string_returns_zero():# Should return 0 for empty stringcodeflash_output=_estimate_string_tokens("")# 290ns -> 389ns (25.4% slower)deftest_simple_string_single_word():# Single word stringcodeflash_output=_estimate_string_tokens("hello")# 2.94μs -> 1.91μs (53.9% faster)deftest_simple_string_multiple_words():# Multiple words separated by spacescodeflash_output=_estimate_string_tokens("hello world")# 3.37μs -> 2.24μs (50.8% faster)deftest_string_with_punctuation():# String with punctuation that should be splitcodeflash_output=_estimate_string_tokens('hello, world.')# 3.89μs -> 2.76μs (40.9% faster)deftest_string_with_multiple_separators():# String with multiple spaces and punctuationcodeflash_output=_estimate_string_tokens('hello,  world.  foo:bar')# 3.73μs -> 2.76μs (34.9% faster)deftest_string_with_leading_trailing_spaces():# Leading and trailing whitespace should not affect token countcodeflash_output=_estimate_string_tokens('   hello world   ')# 3.48μs -> 2.44μs (42.2% faster)deftest_string_with_only_separators():# String with only separators should result in 0 tokenscodeflash_output=_estimate_string_tokens(' , . : ')# 3.10μs -> 2.18μs (42.1% faster)deftest_sequence_of_strings():# Sequence of strings should sum their token countscontent= ["hello world","foo,bar"]codeflash_output=_estimate_string_tokens(content)# 6.31μs -> 3.40μs (85.7% faster)deftest_sequence_of_strings_and_empty_string():# Sequence with empty string should not affect totalcontent= ["hello world","","foo"]codeflash_output=_estimate_string_tokens(content)# 7.43μs -> 3.56μs (109% faster)deftest_string_with_multiple_spaces_between_words():# Multiple spaces between words should not create empty tokenscodeflash_output=_estimate_string_tokens("hello    world")# 3.54μs -> 2.45μs (44.5% faster)# 2. Edge Test Casesdeftest_none_input_returns_zero():# None input should return 0codeflash_output=_estimate_string_tokens(None)# 348ns -> 335ns (3.88% faster)deftest_sequence_empty_list_returns_zero():# Empty sequence should return 0codeflash_output=_estimate_string_tokens([])# 326ns -> 351ns (7.12% slower)deftest_sequence_of_only_non_str_content():# Sequence of only AudioUrl/ImageUrl should return 0content= [AudioUrl("http://a.com"),ImageUrl("http://b.com")]codeflash_output=_estimate_string_tokens(content)# 3.06μs -> 1.28μs (139% faster)deftest_sequence_with_binary_content():# BinaryContent should use length of bytes as tokenscontent= [BinaryContent(b"abcde")]codeflash_output=_estimate_string_tokens(content)# 2.28μs -> 1.14μs (101% faster)deftest_sequence_mixed_types():# Mixed sequence: string, AudioUrl, BinaryContent, ImageUrlcontent= ["hello there",AudioUrl("http://audio.com"),BinaryContent(b"xyz"),ImageUrl("http://img.com"),"foo:bar"    ]# "hello there" -> 2, BinaryContent -> 3, "foo:bar" -> 2codeflash_output=_estimate_string_tokens(content)# 8.83μs -> 4.19μs (110% faster)deftest_sequence_with_empty_binary_content():# BinaryContent with empty bytescontent= [BinaryContent(b"")]codeflash_output=_estimate_string_tokens(content)# 2.15μs -> 1.09μs (97.2% faster)deftest_string_with_consecutive_separators():# Multiple consecutive separators should not create empty tokenscodeflash_output=_estimate_string_tokens("hello,,,  world...foo")# 3.65μs -> 2.67μs (36.6% faster)deftest_sequence_with_unexpected_type():# Sequence with an unexpected type should be ignored (added as 0)classDummy:passcontent= ["hello",Dummy()]codeflash_output=_estimate_string_tokens(content)deftest_string_with_unicode_and_non_ascii():# Unicode characters should be treated as part of tokenscodeflash_output=_estimate_string_tokens("héllo wørld")# 4.34μs -> 3.10μs (39.8% faster)deftest_string_with_newline_and_tab():# Newlines and tabs are whitespace and should be splitcodeflash_output=_estimate_string_tokens("hello\nworld\tfoo")# 3.88μs -> 2.71μs (43.0% faster)deftest_string_with_only_whitespace():# String with only whitespace should return 0codeflash_output=_estimate_string_tokens("\t\n   ")# 2.33μs -> 1.45μs (60.8% faster)deftest_sequence_of_strings_with_whitespace_only():# Sequence with whitespace-only stringscontent= ["   ","\t","\n"]codeflash_output=_estimate_string_tokens(content)# 6.33μs -> 2.28μs (177% faster)deftest_string_with_colon_and_period():# Colons and periods are separatorscodeflash_output=_estimate_string_tokens("foo:bar.baz")# 3.60μs -> 2.77μs (30.2% faster)deftest_string_with_quotes():# Quotes are separatorscodeflash_output=_estimate_string_tokens('foo "bar" baz')# 3.52μs -> 2.58μs (36.5% faster)deftest_string_with_mixed_separators():# All separators togethercodeflash_output=_estimate_string_tokens('foo, bar: "baz".')# 4.15μs -> 2.80μs (48.2% faster)# 3. Large Scale Test Casesdeftest_long_string():# Very long string with 1000 wordslong_str="word "*1000codeflash_output=_estimate_string_tokens(long_str.strip())# 122μs -> 121μs (0.830% faster)deftest_large_sequence_of_strings():# Sequence of 1000 single-word stringscontent= ["hello"]*1000codeflash_output=_estimate_string_tokens(content)# 754μs -> 188μs (299% faster)deftest_large_sequence_of_mixed_content():# Sequence of 500 strings and 500 BinaryContent (each 2 bytes)content= ["foo bar"]*500+ [BinaryContent(b"xy")]*500# 500*2 tokens from strings + 500*2 from BinaryContentcodeflash_output=_estimate_string_tokens(content)# 599μs -> 174μs (243% faster)deftest_large_sequence_with_audio_and_image():# Sequence with 333 strings, 333 AudioUrl, 334 ImageUrlcontent= (        ["foo bar"]*333+        [AudioUrl("http://a.com")]*333+        [ImageUrl("http://b.com")]*334    )# Only strings counted: 333*2 = 666codeflash_output=_estimate_string_tokens(content)# 493μs -> 134μs (267% faster)deftest_large_binary_content():# Single BinaryContent with 999 bytescontent= [BinaryContent(b"x"*999)]codeflash_output=_estimate_string_tokens(content)# 2.11μs -> 1.05μs (101% faster)deftest_large_mixed_string_with_all_separators():# Large string with all separators and 500 tokenss= ("foo, bar. baz: "*125).strip()# 4 tokens per repeat, 125*4=500codeflash_output=_estimate_string_tokens(s)# 39.2μs -> 38.5μs (1.69% faster)# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changesgit checkout codeflash/optimize-_estimate_string_tokens-mcs8yg4q and push.

Codeflash

codeflash-aibotand others added4 commitsJuly 6, 2025 22:31
Certainly! Let's break down the profile first.### Hotspots- Most time is spent calling `re.split()` for every string or string-like object, which is expensive.- Checking `isinstance` for common types on each iteration.- `tokens += 0` operations are no-ops and can be removed.- The regex can be precompiled.- `str.split()` with `None` as delimiter is often *much* faster and covers most whitespace splitting, which is likely enough here.- `content.strip()` is being called redundantly for every string, which we can optimize.Here’s a rewritten, **optimized** version.### Optimizations made.- Precompiled the regex, so it’s not recompiled per call.- Removed all `tokens += 0` and unnecessary else branches (no effect).- Minimized calls to `.strip()` and `.split()` to once per string instance.- Dropped extraneous `isinstance` checks.- Moved logic into clear branches, optimizing the common paths.#### Further possible optimization, if you don't need exact punctuation splitting.If you're willing to change the token estimation (using whitespace instead of the full punctuation split), you can swap out `_TOKEN_SPLIT_RE.split(foo.strip())` to simply `foo.strip().split()` and drop all regex, which is **much** faster. But this does **relax** the original tokenization logic.Let me know if you want it even **faster** with that change, or if you need to preserve the splitting on punctuation!
tokens += len(_TOKEN_SPLIT_RE.split(part.strip()))
elif isinstance(part, BinaryContent):
tokens += len(part.data)
# We don't need explicit handling for AudioUrl or ImageUrl, since they add 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I agree we can skip thetokens += 0, but we should keep on the original todo comment as image/audio URL parts actuallydo add tokens, we just don't count them here.

misrasaurabh1 reacted with thumbs up emoji
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

fixed

returntokens


_TOKEN_SPLIT_RE = re.compile(r'[\s",.:]+')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

this will have some overhead at import time, it's small but it'll add up if we do this with all regular expressions. Should we stick withre.split(r'[\s",.:]+', part.strip()) as it'll cache the regex the first time it's run.

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

To validate the performance characteristics, I tried an experiment where I replaced the current suggestion with inline re.split and ran it on the generated test set and timed the runtime. So the only change is the global re.compile vs inline re.split.
global re.compile time -> 1.68ms
inline re.split -> 2.57ms
Yes, regex does cache the complied regex for future use, but it has overhead that especially when used in a loop can be high. In my experience with optimizations discovered with codeflash, I've seen re.compile be faster.
In this case, since regex is used multiple times and in a loop i would recommend regex compilation. Although its your decision.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@samuelcolvinsamuelcolvinsamuelcolvin left review comments

@DouweMDouweMAwaiting requested review from DouweM

Assignees

@DouweMDouweM

Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

3 participants
@misrasaurabh1@DouweM@samuelcolvin

[8]ページ先頭

©2009-2025 Movatter.jp