Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Adding CountToken to Gemini#2137

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
kauabh wants to merge10 commits intopydantic:main
base:main
Choose a base branch
Loading
fromkauabh:patch-2
Open

Conversation

kauabh
Copy link
Contributor

Gemini Provides an endpoint to count tokenshttps://ai.google.dev/api/tokens#method:-models.counttokens.
I think it will be useful and address some concerns in this issue#1794 (at least for gemini).

@DouweM Wanted to check if this will be helpful. If yes and if the approach is right, wanted to know if you can share some pointers around adding it in usage_limits for gemini. Happy to work on other models too, if this one make it through.

@DouweM
Copy link
Contributor

@kauabh I agree that if a model API has a method to count tokens, it would be nice to expose that on theModel class.

But I don't think we should automatically use it whenUsageLimits(request_tokens_limit=...) is used, as it adds an extra request and the overhead and latency that comes with that, unlike OpenAI'stiktoken which was mentioned in#1794 and can be run locally. So if we'd like to give users the option to better enforcerequest_tokens_limit by doing a separate count-tokens request ahead of the actual LLM request, that should be opt-in with some flag onUsageLimits and appropriate warnings in the docs about the extra overhead.

That check would need to be implemented here, just before we callmodel.request, once we have the messages, model settings, and model request params ready:

asyncdef_make_request(
self,ctx:GraphRunContext[GraphAgentState,GraphAgentDeps[DepsT,NodeRunEndT]]
)->CallToolsNode[DepsT,NodeRunEndT]:
ifself._resultisnotNone:
returnself._result# pragma: no cover
model_settings,model_request_parameters=awaitself._prepare_request(ctx)
model_request_parameters=ctx.deps.model.customize_request_parameters(model_request_parameters)
message_history=await_process_message_history(
ctx.state.message_history,ctx.deps.history_processors,build_run_context(ctx)
)
model_response=awaitctx.deps.model.request(message_history,model_settings,model_request_parameters)
ctx.state.usage.incr(_usage.Usage())
returnself._finish_handling(ctx,model_response)

This would require a method that exists on every model, so it'd be implemented as an abstract method on the baseModel class with a default implementation ofraise NotImplementedError(...), and only models that have a count-tokens method would override it with a concrete implementation.

As for that concrete implementation, I recommend adding it toGoogleModel instead ofGeminiModel, as you can directly use thegoogle-genai library there, and reducing the duplication with the request-preparation logic in_generate_content as much as possible.

@kauabh
Copy link
ContributorAuthor

@DouweM make sense, let me rework on this. Thanks for detailed input, appreciate your time

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers
No reviews
Assignees

@DouweMDouweM

Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

2 participants
@kauabh@DouweM

[8]ページ先頭

©2009-2025 Movatter.jp