Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Usage

The Agents SDK automatically tracks token usage for every run. You can access it from the run context and use it to monitor costs, enforce limits, or record analytics.

What is tracked

  • requests: number of LLM API calls made
  • input_tokens: total input tokens sent
  • output_tokens: total output tokens received
  • total_tokens: input + output
  • request_usage_entries: list of per-request usage breakdowns
  • details:
  • input_tokens_details.cached_tokens
  • output_tokens_details.reasoning_tokens

Accessing usage from a run

AfterRunner.run(...), access usage viaresult.context_wrapper.usage.

result=awaitRunner.run(agent,"What's the weather in Tokyo?")usage=result.context_wrapper.usageprint("Requests:",usage.requests)print("Input tokens:",usage.input_tokens)print("Output tokens:",usage.output_tokens)print("Total tokens:",usage.total_tokens)

Usage is aggregated across all model calls during the run (including tool calls and handoffs).

Enabling usage with LiteLLM models

LiteLLM providers do not report usage metrics by default. When you are usingLitellmModel, passModelSettings(include_usage=True) to your agent so that LiteLLM responses populateresult.context_wrapper.usage.

fromagentsimportAgent,ModelSettings,Runnerfromagents.extensions.models.litellm_modelimportLitellmModelagent=Agent(name="Assistant",model=LitellmModel(model="your/model",api_key="..."),model_settings=ModelSettings(include_usage=True),)result=awaitRunner.run(agent,"What's the weather in Tokyo?")print(result.context_wrapper.usage.total_tokens)

Per-request usage tracking

The SDK automatically tracks usage for each API request inrequest_usage_entries, useful for detailed cost calculation and monitoring context window consumption.

result=awaitRunner.run(agent,"What's the weather in Tokyo?")forrequestinenumerate(result.context_wrapper.usage.request_usage_entries):print(f"Request{i+1}:{request.input_tokens} in,{request.output_tokens} out")

Accessing usage with sessions

When you use aSession (e.g.,SQLiteSession), each call toRunner.run(...) returns usage for that specific run. Sessions maintain conversation history for context, but each run's usage is independent.

session=SQLiteSession("my_conversation")first=awaitRunner.run(agent,"Hi!",session=session)print(first.context_wrapper.usage.total_tokens)# Usage for first runsecond=awaitRunner.run(agent,"Can you elaborate?",session=session)print(second.context_wrapper.usage.total_tokens)# Usage for second run

Note that while sessions preserve conversation context between runs, the usage metrics returned by eachRunner.run() call represent only that particular execution. In sessions, previous messages may be re-fed as input to each run, which affects the input token count in consequent turns.

Using usage in hooks

If you're usingRunHooks, thecontext object passed to each hook containsusage. This lets you log usage at key lifecycle moments.

classMyHooks(RunHooks):asyncdefon_agent_end(self,context:RunContextWrapper,agent:Agent,output:Any)->None:u=context.usageprint(f"{agent.name}{u.requests} requests,{u.total_tokens} total tokens")

API Reference

For detailed API documentation, see:


[8]ページ先頭

©2009-2025 Movatter.jp