NotificationsYou must be signed in to change notification settings
Fork1k
Star11k

Adding CountToken to Gemini#2137

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Open

kauabh wants to merge10 commits intopydantic:main

base:main

Choose a base branch

fromkauabh:patch-2

Open

Adding CountToken to Gemini#2137

kauabh wants to merge10 commits intopydantic:mainfromkauabh:patch-2

+117 −0

Conversation

Copy link

Contributor

kauabh commentedJul 5, 2025

Gemini Provides an endpoint to count tokenshttps://ai.google.dev/api/tokens#method:-models.counttokens.
I think it will be useful and address some concerns in this issue#1794 (at least for gemini).

@DouweM Wanted to check if this will be helpful. If yes and if the approach is right, wanted to know if you can share some pointers around adding it in usage_limits for gemini. Happy to work on other models too, if this one make it through.

kauabh added9 commits

July 6, 2025 04:27

Adding CountToken to Gemini

6f86735

Gemini Provides an endpoint to count token before sending an responsehttps://ai.google.dev/api/tokens#method:-models.counttokens

Update gemini.py

5cd88e0

added type adaptor

Update gemini.py

a302345

Removed extra assignment

Update gemini.py

3b2e26a

Linting

Update gemini.py

dc4d29b

Linting

Update gemini.py

16f18dc

Update gemini.py

24d6c25

Update gemini.py

90fc8bb

Linting

Update gemini.py

2bfc8d0

Removed White Space

Copy link

Contributor

DouweM commentedJul 7, 2025

@kauabh I agree that if a model API has a method to count tokens, it would be nice to expose that on theModel class.

But I don't think we should automatically use it whenUsageLimits(request_tokens_limit=...) is used, as it adds an extra request and the overhead and latency that comes with that, unlike OpenAI'stiktoken which was mentioned in#1794 and can be run locally. So if we'd like to give users the option to better enforcerequest_tokens_limit by doing a separate count-tokens request ahead of the actual LLM request, that should be opt-in with some flag onUsageLimits and appropriate warnings in the docs about the extra overhead.

That check would need to be implemented here, just before we callmodel.request, once we have the messages, model settings, and model request params ready:

pydantic-ai/pydantic_ai_slim/pydantic_ai/_agent_graph.py

Lines 379 to 393 inb31c77d

	asyncdef_make_request(
	self,ctx:GraphRunContext[GraphAgentState,GraphAgentDeps[DepsT,NodeRunEndT]]
	)->CallToolsNode[DepsT,NodeRunEndT]:
	ifself._resultisnotNone:
	returnself._result# pragma: no cover

	model_settings,model_request_parameters=awaitself._prepare_request(ctx)
	model_request_parameters=ctx.deps.model.customize_request_parameters(model_request_parameters)
	message_history=await_process_message_history(
	ctx.state.message_history,ctx.deps.history_processors,build_run_context(ctx)
	)
	model_response=awaitctx.deps.model.request(message_history,model_settings,model_request_parameters)
	ctx.state.usage.incr(_usage.Usage())

	returnself._finish_handling(ctx,model_response)

This would require a method that exists on every model, so it'd be implemented as an abstract method on the baseModel class with a default implementation ofraise NotImplementedError(...), and only models that have a count-tokens method would override it with a concrete implementation.

As for that concrete implementation, I recommend adding it toGoogleModel instead ofGeminiModel, as you can directly use thegoogle-genai library there, and reducing the duplication with the request-preparation logic in_generate_content as much as possible.