NotificationsYou must be signed in to change notification settings
Fork1k
Star11k

⚡️ Speed up function`_estimate_string_tokens` by 221%#2156

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Open

misrasaurabh1 wants to merge5 commits intopydantic:main

base:main

Choose a base branch

frommisrasaurabh1:codeflash/optimize-_estimate_string_tokens-mcs8yg4q

Open

⚡️ Speed up function`_estimate_string_tokens` by 221%#2156

misrasaurabh1 wants to merge5 commits intopydantic:mainfrommisrasaurabh1:codeflash/optimize-_estimate_string_tokens-mcs8yg4q

+15 −16

Conversation

Copy link

Contributor

misrasaurabh1 commentedJul 8, 2025

📄 221% (2.21x) speedup for`_estimate_string_tokens` in`pydantic_ai_slim/pydantic_ai/models/function.py`

⏱️ Runtime :5.29 milliseconds→1.65 milliseconds (best of89 runs)

📝 Explanation and details

Hotspots

Most time is spent callingre.split() for every string or string-like object, which is expensive.
Checkingisinstance for common types on each iteration.
tokens += 0 operations are no-ops and can be removed.
The regex can be precompiled.
str.split() withNone as delimiter is oftenmuch faster and covers most whitespace splitting, which is likely enough here.
content.strip() is being called redundantly for every string, which we can optimize.

Optimizations made.

Precompiled the regex, so it’s not recompiled per call.
Removed alltokens += 0 and unnecessary else branches (no effect).
Minimized calls to.strip() and.split() to once per string instance.
Dropped extraneousisinstance checks.
Moved logic into clear branches, optimizing the common paths.

✅Correctness verification report:

Test	Status
⚙️ Existing Unit Tests	🔘None Found
🌀 Generated Regression Tests	✅64 Passed
⏪ Replay Tests	🔘None Found
🔎 Concolic Coverage Tests	🔘None Found
📊 Tests Coverage	85.7%

🌀 Generated Regression Tests and Runtime

importrefromcollections.abcimportSequence# importsimportpytest# used for our unit testsfrompydantic_ai.models.functionimport_estimate_string_tokens# Dummy classes to mimic pydantic_ai.messages for testingclassAudioUrl:def__init__(self,url):self.url=urlclassImageUrl:def__init__(self,url):self.url=urlclassBinaryContent:def__init__(self,data):self.data=datafrompydantic_ai.models.functionimport_estimate_string_tokens# unit tests# ------------------------ BASIC TEST CASES ------------------------deftest_empty_string_returns_zero():# Test that empty string returns 0 tokenscodeflash_output=_estimate_string_tokens("")# 349ns -> 378ns (7.67% slower)deftest_simple_sentence():# Test a simple sentencecodeflash_output=_estimate_string_tokens("Hello world")# 3.63μs -> 2.42μs (49.8% faster)deftest_sentence_with_punctuation():# Test sentence with punctuationcodeflash_output=_estimate_string_tokens("Hello, world.")# 3.28μs -> 2.45μs (33.9% faster)deftest_sentence_with_multiple_spaces():# Test sentence with multiple spacescodeflash_output=_estimate_string_tokens("Hello    world")# 3.57μs -> 2.39μs (49.4% faster)deftest_sentence_with_mixed_delimiters():# Test sentence with various delimiterscodeflash_output=_estimate_string_tokens('Hello, world: "Python".')# 3.83μs -> 2.73μs (40.3% faster)deftest_string_with_leading_and_trailing_spaces():# Leading/trailing whitespace should not affect token countcodeflash_output=_estimate_string_tokens("   Hello world   ")# 3.52μs -> 2.39μs (47.2% faster)deftest_string_with_only_delimiters():# Only delimiters should result in zero tokenscodeflash_output=_estimate_string_tokens(" ,.:  ")# 3.34μs -> 2.05μs (62.7% faster)deftest_string_with_newlines_and_tabs():# Newlines and tabs are whitespace and should be treated as delimiterscodeflash_output=_estimate_string_tokens("Hello\nworld\tPython")# 3.81μs -> 2.75μs (38.5% faster)# ------------------------ EDGE TEST CASES ------------------------deftest_none_input_returns_zero():# None input should return 0 tokenscodeflash_output=_estimate_string_tokens(None)# 338ns -> 366ns (7.65% slower)deftest_empty_list_returns_zero():# Empty list should return 0 tokenscodeflash_output=_estimate_string_tokens([])# 358ns -> 358ns (0.000% faster)deftest_list_of_empty_strings():# List of empty strings should return 0 tokenscodeflash_output=_estimate_string_tokens(["","",""])# 5.70μs -> 1.97μs (190% faster)deftest_list_of_strings():# List of strings should sum token countscodeflash_output=_estimate_string_tokens(["Hello world","Python is great"])# 6.52μs -> 3.66μs (78.0% faster)deftest_list_with_string_and_empty_string():# List with a string and an empty stringcodeflash_output=_estimate_string_tokens(["Hello world",""])# 6.68μs -> 3.06μs (119% faster)deftest_list_with_audio_and_image_url():# AudioUrl and ImageUrl should contribute 0 tokensaudio=AudioUrl("http://audio.url")image=ImageUrl("http://image.url")codeflash_output=_estimate_string_tokens([audio,image])# 2.66μs -> 1.19μs (123% faster)deftest_list_with_string_and_audio_url():# String and AudioUrl, only string countsaudio=AudioUrl("http://audio.url")codeflash_output=_estimate_string_tokens(["Hello world",audio])# 5.87μs -> 3.18μs (84.8% faster)deftest_list_with_binary_content():# BinaryContent's token count is len(data)binary=BinaryContent(b"abcde")codeflash_output=_estimate_string_tokens([binary])# 2.18μs -> 1.00μs (117% faster)deftest_list_with_string_and_binary_content():# Both string and BinaryContent contributebinary=BinaryContent(b"abcde")codeflash_output=_estimate_string_tokens(["Hello",binary])# 5.65μs -> 2.60μs (117% faster)deftest_list_with_all_types():# All types togetheraudio=AudioUrl("http://audio.url")image=ImageUrl("http://image.url")binary=BinaryContent(b"xyz")codeflash_output=_estimate_string_tokens(["Hi there",audio,image,binary,"Python."])# 7.98μs -> 3.91μs (104% faster)deftest_list_with_unexpected_type():# Unexpected type should contribute 0 tokensclassDummy:passcodeflash_output=_estimate_string_tokens(["Hello",Dummy()])deftest_string_with_only_spaces():# String of only spaces returns 0codeflash_output=_estimate_string_tokens("     ")# 2.44μs -> 1.40μs (74.8% faster)deftest_string_with_unicode_characters():# Unicode characters should be counted as tokenscodeflash_output=_estimate_string_tokens("你好 世界")# 5.61μs -> 4.50μs (24.6% faster)deftest_string_with_mixed_unicode_and_ascii():# Mixed unicode and asciicodeflash_output=_estimate_string_tokens("hello 世界")# 4.64μs -> 3.56μs (30.4% faster)deftest_list_with_unicode_strings():# List with unicode stringscodeflash_output=_estimate_string_tokens(["你好 世界","hello"])# 8.25μs -> 4.64μs (78.0% faster)deftest_binary_content_empty():# BinaryContent with empty databinary=BinaryContent(b"")codeflash_output=_estimate_string_tokens([binary])# 1.96μs -> 1.02μs (91.7% faster)deftest_list_with_nested_empty_lists():# Nested empty lists are not supported, but should not crashcodeflash_output=_estimate_string_tokens([[]])# 2.03μs -> 905ns (124% faster)# ------------------------ LARGE SCALE TEST CASES ------------------------deftest_long_string_1000_words():# Test a string with 1000 wordslong_str="word "*1000codeflash_output=_estimate_string_tokens(long_str.strip())# 121μs -> 111μs (8.44% faster)deftest_list_of_1000_strings():# List of 1000 single-word stringsstring_list= ["word"]*1000codeflash_output=_estimate_string_tokens(string_list)# 701μs -> 191μs (266% faster)deftest_list_of_1000_empty_strings():# List of 1000 empty stringsstring_list= [""]*1000codeflash_output=_estimate_string_tokens(string_list)# 596μs -> 89.6μs (566% faster)deftest_list_of_500_strings_and_500_binary():# 500 strings, 500 BinaryContent (each with 2 bytes)string_list= ["hi"]*500binary_list= [BinaryContent(b"ab")]*500codeflash_output=_estimate_string_tokens(string_list+binary_list)# 504μs -> 112μs (349% faster)deftest_list_of_1000_audio_and_image_urls():# 500 AudioUrl, 500 ImageUrlaudio_list= [AudioUrl("a")]*500image_list= [ImageUrl("b")]*500codeflash_output=_estimate_string_tokens(audio_list+image_list)# 278μs -> 45.4μs (513% faster)deftest_list_of_mixed_types_large():# 333 strings, 333 binary (3 bytes), 334 imagesstring_list= ["hello world"]*333# 2 tokens eachbinary_list= [BinaryContent(b"xyz")]*333# 3 tokens eachimage_list= [ImageUrl("img")]*334# 0 tokens eachexpected=333*2+333*3+334*0codeflash_output=_estimate_string_tokens(string_list+binary_list+image_list)# 512μs -> 150μs (242% faster)deftest_performance_large_string():# Test that large string doesn't crash or hang (not a strict performance test)long_str="a "*999+"a"codeflash_output=_estimate_string_tokens(long_str)# 66.2μs -> 66.3μs (0.202% slower)deftest_performance_large_list():# Test that large list doesn't crash or hangstring_list= ["a b c"]*333# 3 tokens eachcodeflash_output=_estimate_string_tokens(string_list)# 295μs -> 108μs (172% faster)# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.importrefromcollections.abcimportSequence# importsimportpytest# used for our unit testsfrompydantic_ai.models.functionimport_estimate_string_tokens# Dummy classes to mimic pydantic_ai.messagesclassAudioUrl:def__init__(self,url):self.url=urlclassImageUrl:def__init__(self,url):self.url=urlclassBinaryContent:def__init__(self,data:bytes):self.data=dataUserContent= (str,AudioUrl,ImageUrl,BinaryContent)frompydantic_ai.models.functionimport_estimate_string_tokens# unit tests# 1. Basic Test Casesdeftest_empty_string_returns_zero():# Should return 0 for empty stringcodeflash_output=_estimate_string_tokens("")# 290ns -> 389ns (25.4% slower)deftest_simple_string_single_word():# Single word stringcodeflash_output=_estimate_string_tokens("hello")# 2.94μs -> 1.91μs (53.9% faster)deftest_simple_string_multiple_words():# Multiple words separated by spacescodeflash_output=_estimate_string_tokens("hello world")# 3.37μs -> 2.24μs (50.8% faster)deftest_string_with_punctuation():# String with punctuation that should be splitcodeflash_output=_estimate_string_tokens('hello, world.')# 3.89μs -> 2.76μs (40.9% faster)deftest_string_with_multiple_separators():# String with multiple spaces and punctuationcodeflash_output=_estimate_string_tokens('hello,  world.  foo:bar')# 3.73μs -> 2.76μs (34.9% faster)deftest_string_with_leading_trailing_spaces():# Leading and trailing whitespace should not affect token countcodeflash_output=_estimate_string_tokens('   hello world   ')# 3.48μs -> 2.44μs (42.2% faster)deftest_string_with_only_separators():# String with only separators should result in 0 tokenscodeflash_output=_estimate_string_tokens(' , . : ')# 3.10μs -> 2.18μs (42.1% faster)deftest_sequence_of_strings():# Sequence of strings should sum their token countscontent= ["hello world","foo,bar"]codeflash_output=_estimate_string_tokens(content)# 6.31μs -> 3.40μs (85.7% faster)deftest_sequence_of_strings_and_empty_string():# Sequence with empty string should not affect totalcontent= ["hello world","","foo"]codeflash_output=_estimate_string_tokens(content)# 7.43μs -> 3.56μs (109% faster)deftest_string_with_multiple_spaces_between_words():# Multiple spaces between words should not create empty tokenscodeflash_output=_estimate_string_tokens("hello    world")# 3.54μs -> 2.45μs (44.5% faster)# 2. Edge Test Casesdeftest_none_input_returns_zero():# None input should return 0codeflash_output=_estimate_string_tokens(None)# 348ns -> 335ns (3.88% faster)deftest_sequence_empty_list_returns_zero():# Empty sequence should return 0codeflash_output=_estimate_string_tokens([])# 326ns -> 351ns (7.12% slower)deftest_sequence_of_only_non_str_content():# Sequence of only AudioUrl/ImageUrl should return 0content= [AudioUrl("http://a.com"),ImageUrl("http://b.com")]codeflash_output=_estimate_string_tokens(content)# 3.06μs -> 1.28μs (139% faster)deftest_sequence_with_binary_content():# BinaryContent should use length of bytes as tokenscontent= [BinaryContent(b"abcde")]codeflash_output=_estimate_string_tokens(content)# 2.28μs -> 1.14μs (101% faster)deftest_sequence_mixed_types():# Mixed sequence: string, AudioUrl, BinaryContent, ImageUrlcontent= ["hello there",AudioUrl("http://audio.com"),BinaryContent(b"xyz"),ImageUrl("http://img.com"),"foo:bar"    ]# "hello there" -> 2, BinaryContent -> 3, "foo:bar" -> 2codeflash_output=_estimate_string_tokens(content)# 8.83μs -> 4.19μs (110% faster)deftest_sequence_with_empty_binary_content():# BinaryContent with empty bytescontent= [BinaryContent(b"")]codeflash_output=_estimate_string_tokens(content)# 2.15μs -> 1.09μs (97.2% faster)deftest_string_with_consecutive_separators():# Multiple consecutive separators should not create empty tokenscodeflash_output=_estimate_string_tokens("hello,,,  world...foo")# 3.65μs -> 2.67μs (36.6% faster)deftest_sequence_with_unexpected_type():# Sequence with an unexpected type should be ignored (added as 0)classDummy:passcontent= ["hello",Dummy()]codeflash_output=_estimate_string_tokens(content)deftest_string_with_unicode_and_non_ascii():# Unicode characters should be treated as part of tokenscodeflash_output=_estimate_string_tokens("héllo wørld")# 4.34μs -> 3.10μs (39.8% faster)deftest_string_with_newline_and_tab():# Newlines and tabs are whitespace and should be splitcodeflash_output=_estimate_string_tokens("hello\nworld\tfoo")# 3.88μs -> 2.71μs (43.0% faster)deftest_string_with_only_whitespace():# String with only whitespace should return 0codeflash_output=_estimate_string_tokens("\t\n   ")# 2.33μs -> 1.45μs (60.8% faster)deftest_sequence_of_strings_with_whitespace_only():# Sequence with whitespace-only stringscontent= ["   ","\t","\n"]codeflash_output=_estimate_string_tokens(content)# 6.33μs -> 2.28μs (177% faster)deftest_string_with_colon_and_period():# Colons and periods are separatorscodeflash_output=_estimate_string_tokens("foo:bar.baz")# 3.60μs -> 2.77μs (30.2% faster)deftest_string_with_quotes():# Quotes are separatorscodeflash_output=_estimate_string_tokens('foo "bar" baz')# 3.52μs -> 2.58μs (36.5% faster)deftest_string_with_mixed_separators():# All separators togethercodeflash_output=_estimate_string_tokens('foo, bar: "baz".')# 4.15μs -> 2.80μs (48.2% faster)# 3. Large Scale Test Casesdeftest_long_string():# Very long string with 1000 wordslong_str="word "*1000codeflash_output=_estimate_string_tokens(long_str.strip())# 122μs -> 121μs (0.830% faster)deftest_large_sequence_of_strings():# Sequence of 1000 single-word stringscontent= ["hello"]*1000codeflash_output=_estimate_string_tokens(content)# 754μs -> 188μs (299% faster)deftest_large_sequence_of_mixed_content():# Sequence of 500 strings and 500 BinaryContent (each 2 bytes)content= ["foo bar"]*500+ [BinaryContent(b"xy")]*500# 500*2 tokens from strings + 500*2 from BinaryContentcodeflash_output=_estimate_string_tokens(content)# 599μs -> 174μs (243% faster)deftest_large_sequence_with_audio_and_image():# Sequence with 333 strings, 333 AudioUrl, 334 ImageUrlcontent= (        ["foo bar"]*333+        [AudioUrl("http://a.com")]*333+        [ImageUrl("http://b.com")]*334    )# Only strings counted: 333*2 = 666codeflash_output=_estimate_string_tokens(content)# 493μs -> 134μs (267% faster)deftest_large_binary_content():# Single BinaryContent with 999 bytescontent= [BinaryContent(b"x"*999)]codeflash_output=_estimate_string_tokens(content)# 2.11μs -> 1.05μs (101% faster)deftest_large_mixed_string_with_all_separators():# Large string with all separators and 500 tokenss= ("foo, bar. baz: "*125).strip()# 4 tokens per repeat, 125*4=500codeflash_output=_estimate_string_tokens(s)# 39.2μs -> 38.5μs (1.69% faster)# codeflash_output is used to check that the output of the original code is the same as that of the optimized code.

To edit these changesgit checkout codeflash/optimize-_estimate_string_tokens-mcs8yg4q and push.

codeflash-aibotand others added4 commits

July 6, 2025 22:31

⚡️ Speed up function_estimate_string_tokens by 221%

4c73470

Certainly! Let's break down the profile first.### Hotspots- Most time is spent calling `re.split()` for every string or string-like object, which is expensive.- Checking `isinstance` for common types on each iteration.- `tokens += 0` operations are no-ops and can be removed.- The regex can be precompiled.- `str.split()` with `None` as delimiter is often *much* faster and covers most whitespace splitting, which is likely enough here.- `content.strip()` is being called redundantly for every string, which we can optimize.Here’s a rewritten, **optimized** version.### Optimizations made.- Precompiled the regex, so it’s not recompiled per call.- Removed all `tokens += 0` and unnecessary else branches (no effect).- Minimized calls to `.strip()` and `.split()` to once per string instance.- Dropped extraneous `isinstance` checks.- Moved logic into clear branches, optimizing the common paths.#### Further possible optimization, if you don't need exact punctuation splitting.If you're willing to change the token estimation (using whitespace instead of the full punctuation split), you can swap out `_TOKEN_SPLIT_RE.split(foo.strip())` to simply `foo.strip().split()` and drop all regex, which is **much** faster. But this does **relax** the original tokenization logic.Let me know if you want it even **faster** with that change, or if you need to preserve the splitting on punctuation!

Update pydantic_ai_slim/pydantic_ai/models/function.py

d33baa4

Apply suggestions from code review

270bed9

Merge branch 'main' into codeflash/optimize-_estimate_string_tokens-m…

adabd7c

…cs8yg4q

DouweM requested changes

Jul 9, 2025

View reviewed changes

pydantic_ai_slim/pydantic_ai/models/function.py Outdated

		tokens += len(_TOKEN_SPLIT_RE.split(part.strip()))
		elif isinstance(part, BinaryContent):
		tokens += len(part.data)
		# We don't need explicit handling for AudioUrl or ImageUrl, since they add 0

Copy link

Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I agree we can skip thetokens += 0, but we should keep on the original todo comment as image/audio URL parts actuallydo add tokens, we just don't count them here.

Copy link

ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

fixed

DouweM self-assigned this

Jul 9, 2025

DouweM added the awaiting author revision label

Jul 9, 2025

samuelcolvin reviewed

Jul 9, 2025

View reviewed changes

pydantic_ai_slim/pydantic_ai/models/function.py

		returntokens


		_TOKEN_SPLIT_RE = re.compile(r'[\s",.:]+')

Copy link

Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

this will have some overhead at import time, it's small but it'll add up if we do this with all regular expressions. Should we stick withre.split(r'[\s",.:]+', part.strip()) as it'll cache the regex the first time it's run.

Copy link

ContributorAuthor

misrasaurabh1Jul 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

To validate the performance characteristics, I tried an experiment where I replaced the current suggestion with inline re.split and ran it on the generated test set and timed the runtime. So the only change is the global re.compile vs inline re.split.
global re.compile time -> 1.68ms
inline re.split -> 2.57ms
Yes, regex does cache the complied regex for future use, but it has overhead that especially when used in a loop can be high. In my experience with optimizations discovered with codeflash, I've seen re.compile be faster.
In this case, since regex is used multiple times and in a loop i would recommend regex compilation. Although its your decision.

fix comment

1836286

misrasaurabh1 requested a review fromDouweM

July 16, 2025 05:34

Labels

awaiting author revision

3 participants

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

⚡️ Speed up function`_estimate_string_tokens` by 221%#2156

Are you sure you want to change the base?

⚡️ Speed up function`_estimate_string_tokens` by 221%#2156

Conversation

misrasaurabh1 commentedJul 8, 2025

📄 221% (2.21x) speedup for`_estimate_string_tokens` in`pydantic_ai_slim/pydantic_ai/models/function.py`

📝 Explanation and details

Hotspots

Optimizations made.

Uh oh!

DouweMJul 9, 2025

Choose a reason for hiding this comment

Uh oh!

misrasaurabh1Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

samuelcolvinJul 9, 2025

Choose a reason for hiding this comment

Uh oh!

misrasaurabh1Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Movatterモバイル変換

⚡️ Speed up function_estimate_string_tokens by 221%#2156

Are you sure you want to change the base?

⚡️ Speed up function_estimate_string_tokens by 221%#2156

Conversation

misrasaurabh1 commentedJul 8, 2025

📄 221% (2.21x) speedup for_estimate_string_tokens inpydantic_ai_slim/pydantic_ai/models/function.py

📝 Explanation and details

Hotspots

Optimizations made.

Uh oh!

DouweMJul 9, 2025

Choose a reason for hiding this comment

Uh oh!

misrasaurabh1Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

samuelcolvinJul 9, 2025

Choose a reason for hiding this comment

Uh oh!

misrasaurabh1Jul 10, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

⚡️ Speed up function`_estimate_string_tokens` by 221%#2156

⚡️ Speed up function`_estimate_string_tokens` by 221%#2156

📄 221% (2.21x) speedup for`_estimate_string_tokens` in`pydantic_ai_slim/pydantic_ai/models/function.py`