用量

Agents SDK 会自动跟踪每次运行的 token 用量。你可以从运行上下文中获取它，用于监控成本、实施限制或记录分析数据。

跟踪内容

requests: 发起的 LLM API 调用次数
input_tokens: 发送的输入 token 总数
output_tokens: 接收的输出 token 总数
total_tokens: 输入 + 输出
request_usage_entries: 按请求的用量明细列表
details:
input_tokens_details.cached_tokens
output_tokens_details.reasoning_tokens

从一次运行访问用量

在Runner.run(...) 之后，通过result.context_wrapper.usage 访问用量。

result=awaitRunner.run(agent,"What's the weather in Tokyo?")usage=result.context_wrapper.usageprint("Requests:",usage.requests)print("Input tokens:",usage.input_tokens)print("Output tokens:",usage.output_tokens)print("Total tokens:",usage.total_tokens)

一次运行期间会聚合所有模型调用的用量（包括工具调用和任务转移）。

在 LiteLLM 模型中启用用量

LiteLLM 提供方默认不报告用量指标。使用LitellmModel 时，向你的智能体传入ModelSettings(include_usage=True)，以便 LiteLLM 响应填充result.context_wrapper.usage。

fromagentsimportAgent,ModelSettings,Runnerfromagents.extensions.models.litellm_modelimportLitellmModelagent=Agent(name="Assistant",model=LitellmModel(model="your/model",api_key="..."),model_settings=ModelSettings(include_usage=True),)result=awaitRunner.run(agent,"What's the weather in Tokyo?")print(result.context_wrapper.usage.total_tokens)

按请求的用量跟踪

SDK 会自动在request_usage_entries 中跟踪每个 API 请求的用量，便于精细的成本计算和监控上下文窗口消耗。

result=awaitRunner.run(agent,"What's the weather in Tokyo?")forrequestinenumerate(result.context_wrapper.usage.request_usage_entries):print(f"Request{i+1}:{request.input_tokens} in,{request.output_tokens} out")

会话中的用量访问

当你使用Session（例如SQLiteSession）时，每次调用Runner.run(...) 都会返回该次运行的用量。会话为上下文维护对话历史，但每次运行的用量彼此独立。

session=SQLiteSession("my_conversation")first=awaitRunner.run(agent,"Hi!",session=session)print(first.context_wrapper.usage.total_tokens)# Usage for first runsecond=awaitRunner.run(agent,"Can you elaborate?",session=session)print(second.context_wrapper.usage.total_tokens)# Usage for second run

请注意，尽管会话会在运行间保留对话上下文，但每次Runner.run() 调用返回的用量指标仅代表该次执行。在会话中，之前的消息可能会在每次运行时被重新作为输入提供，从而影响后续轮次的输入 token 计数。

在钩子中使用用量

如果你使用RunHooks，传递给每个钩子的context 对象包含usage。这可让你在关键生命周期时刻记录用量。

classMyHooks(RunHooks):asyncdefon_agent_end(self,context:RunContextWrapper,agent:Agent,output:Any)->None:u=context.usageprint(f"{agent.name} →{u.requests} requests,{u.total_tokens} total tokens")

API 参考

有关详细的 API 文档，请参阅：

Usage - 用量跟踪数据结构
RequestUsage - 按请求的用量详情
RunContextWrapper - 从运行上下文访问用量
RunHooks - 挂钩用量跟踪生命周期

Movatterモバイル変換