API reference for the/inference endpoint.
POST /inferencePOST /openai/v1/chat/completions for an inference endpoint compatible with the OpenAI API.additional_tools[])name (string, required): The name of the tooldescription (string, required): A description of what the tool doesparameters (object, required): A JSON Schema defining the tool’s parametersstrict (boolean, optional): Whether to enforce strict schema validation (defaults tofalse)type (string, required): Must be"openai_custom"name (string, required): The name of the tooldescription (string, optional): A description of what the tool doesformat (object, optional): The output format for the tool (see below)format field can be one of:{"type": "text"}: Freeform text output{"type": "grammar", "grammar": {"syntax": "lark", "definition": "..."}}: Output constrained by aLark grammar{"type": "grammar", "grammar": {"syntax": "regex", "definition": "..."}}: Output constrained by a regular expressionExample: OpenAI Custom Tool with Text Format
{ "model_name":"openai::gpt-5-mini", "input": { "messages": [ { "role":"user", "content":"Generate Python code to print 'Hello, World!'" } ] }, "additional_tools": [ { "type":"openai_custom", "name":"code_generator", "description":"Generates Python code snippets", "format": {"type":"text" } } ]}Example: OpenAI Custom Tool with Regex Grammar
{ "model_name":"openai::gpt-5-mini", "input": { "messages": [ {"role":"user","content":"Format the phone number 4155550123" } ] }, "additional_tools": [ { "type":"openai_custom", "name":"phone_formatter", "description":"Formats phone numbers in XXX-XXX-XXXX format", "format": { "type":"grammar", "grammar": { "syntax":"regex", "definition":"^\\d{3}-\\d{3}-\\d{4}$" } } } ]}allowed_toolsadditional_tools.Some providers (notably OpenAI) natively support restricting allowed tools.For these providers, we send all tools (both configured and dynamic) to the provider, and separately specify which ones are allowed to be called.For providers that do not natively support this feature, we filter the tool list ourselves and only send the allowed tools to the provider.cache_options{"enabled": "write_only"})cache_options.enabled"write_only")"write_only" (default): Only write to cache but don’t serve cached responses"read_only": Only read from cache but don’t write new entries"on": Both read from and write to cache"off": Disable caching completelydryrun=true, the gateway never writes to the cache.cache_options.max_age_snull)max_age_s=3600, the gateway will only use cache entries that were created in the last hour.credentialsdynamic location (e.g.dynamic::my_dynamic_api_key_name).See theconfiguration reference for more details.The gateway expects the credentials to be provided in thecredentials field of the request body as specified below.The gateway will return a 400 error if the credentials are not provided and the model provider has been configured with dynamic credentials.Example
[models.my_model_name.providers.my_provider_name]# ...# Note: the name of the credential field (e.g. `api_key_location`) depends on the provider typeapi_key_location ="dynamic::my_dynamic_api_key_name"# ...{ // ... "credentials": { // ... "my_dynamic_api_key_name":"sk-..." // ... } // ...}dryruntrue, the inference request will be executed but won’t be stored to the database.The gateway will still call the downstream model providers.This field is primarily for debugging and testing, and you should generally not use it in production.episode_idextra_bodyextra_body field allows you to modify the request body that TensorZero sends to a model provider.This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.Each object in the array must have two or three fields:pointer: AJSON Pointer string specifying where to modify the request bodyvalue: The value to insert at that location; it can be of any type including nested typesdelete = true: Deletes the field at the specified location, if present.variant_namemodel_namemodel_name andprovider_nameextra_body in the configuration file.The values provided at inference-time take priority over the values in the configuration file.Example: `extra_body`
{ "project":"tensorzero", "safety_checks": { "no_internet":false, "no_agi":true }}extra_body in the inference request…{ // ... "extra_body": [ { "variant_name":"my_variant",// or "model_name": "my_model", "provider_name": "my_provider" "pointer":"/agi", "value":true }, { // No `variant_name` or `model_name`/`provider_name` specified, so it applies to all variants and providers "pointer":"/safety_checks/no_agi", "value": { "bypass":"on" } } ]}{ "agi":true, "project":"tensorzero", "safety_checks": { "no_internet":false, "no_agi": { "bypass":"on" } }}extra_headersextra_headers field allows you to modify the request headers that TensorZero sends to a model provider.This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.Each object in the array must have two or three fields:name: The name of the header to modifyvalue: The value to set the header tovariant_namemodel_namemodel_name andprovider_nameextra_headers in the configuration file.The values provided at inference-time take priority over the values in the configuration file.Example: `extra_headers`
Safety-Checks: onextra_headers…{ "extra_headers": [ { "variant_name":"my_variant",// or "model_name": "my_model", "provider_name": "my_provider" "name":"Safety-Checks", "value":"off" }, { // No `variant_name` or `model_name`/`provider_name` specified, so it applies to all variants and providers "name":"Intelligence-Level", "value":"AGI" } ]}Safety-Checks is set tooff only formy_variant, whileIntelligence-Level: AGI is applied globally to all variants and providers:Safety-Checks: offIntelligence-Level: AGIfunction_namefunction_name ormodel_name must be providedmodel_name field to call a model directly, without the need to define a function.See below for more details.include_original_responsetrue, the original response from the model will be included in the response in theoriginal_response field as a string.Seeoriginal_response in theresponse section for more details.inputinput.messages[])role: The role of the message (assistant oruser).content: The content of the message (see below).content field can be have one of the following types:type and additional fields depending on the type.If the content block has typetext, it must have either of the following additional fields:text: The text for the content block.arguments: A JSON object containing the function arguments for TensorZero functions with templates and schemas (seeCreate a prompt template for details).tool_call, it must have the following additional fields:arguments: The arguments for the tool call.id: The ID for the content block.name: The name of the tool for the content block.tool_result, it must have the following additional fields:id: The ID for the content block.name: The name of the tool for the content block.result: The result of the tool call.file, it must have exactly one of the following additional fields:file_type: must beurlurlmime_type (optional): override the MIME type of the filedetail (optional): controls the fidelity of image processing. Only applies to image files; ignored for other file types. Can below,high, orauto. Affects token consumption and image quality. Only supported by some model providers; ignored otherwise.filename (optional): a filename to associate with the filefile_type: must bebase64data:base64-encoded data for an embedded filemime_type: the MIME type (e.g.image/png,image/jpeg,application/pdf)detail (optional): controls the fidelity of image processing. Only applies to image files; ignored for other file types. Can below,high, orauto. Affects token consumption and image quality. Only supported by some model providers; ignored otherwise.filename (optional): a filename to associate with the fileraw_text, it must have the following additional fields:value: The text for the content block.This content block will ignore any relevant templates and schemas for this function.thought, it must have the following additional fields:text: The text for the content block.unknown, it must have the following additional fields:data: The original content block from the provider, without any validation or transformation by TensorZero.model_provider_name (optional): A string specifying when this content block should be included in the model provider input.If set, the content block will only be provided to this specific model provider.If not set, the content block is passed to all model providers.daydreaming content block to inference requests targeting theyour_model_provider_name model provider.{ "type":"unknown", "data": { "type":"daydreaming", "dream":"..." }, "model_provider_name":"tensorzero::model_name::your_model_name::provider_name::your_model_provider_name"}Example
{ // ... "input": { "messages": [ // If you don't have a user (or assistant) schema... { "role":"user",// (or "assistant") "content":"What is the weather in Tokyo?" }, // If you have a user (or assistant) schema... { "role":"user",// (or "assistant") "content": [ { "type":"text", "arguments": { "location":"Tokyo" } } ] }, // If the model previously called a tool... { "role":"assistant", "content": [ { "type":"tool_call", "id":"0", "name":"get_temperature", "arguments":"{\"location\":\"Tokyo\"}" } ] }, // ...and you're providing the result of that tool call... { "role":"user", "content": [ { "type":"tool_result", "id":"0", "name":"get_temperature", "result":"70" } ] }, // You can also specify a text message using a content block... { "role":"user", "content": [ { "type":"text", "text":"What about NYC?" // (or object if there is a schema) } ] }, // You can also provide multiple content blocks in a single message... { "role":"assistant", "content": [ { "type":"text", "text":"Sure, I can help you with that." // (or object if there is a schema) }, { "type":"tool_call", "id":"0", "name":"get_temperature", "arguments":"{\"location\":\"New York\"}" } ] } // ... ] // ... } // ...}input.systemmodel_namemodel_name orfunction_name must be providedtensorzero::default.| To call… | Use this format… |
A function defined as[functions.my_function] in yourtensorzero.toml configuration file | function_name="my_function" (notmodel_name) |
A model defined as[models.my_model] in yourtensorzero.tomlconfiguration file | model_name="my_model" |
A model offered by a model provider, without defining it in yourtensorzero.toml configuration file (if supported, see below) | model_name="{provider_type}::{model_name}" |
anthropic,deepseek,fireworks,gcp_vertex_anthropic,gcp_vertex_gemini,google_ai_studio_gemini,groq,hyperbolic,mistral,openai,openrouter,together, andxai.[models.gpt-4o]routing = ["openai","azure"][models.gpt-4o.providers.openai]# ...[models.gpt-4o.providers.azure]# ...[functions.extract-data]# ...function_name="extract-data" calls theextract-data function defined above.model_name="gpt-4o" calls thegpt-4o model in your configuration, which supports fallback fromopenai toazure. SeeRetries & Fallbacks for details.model_name="openai::gpt-4o" calls the OpenAI API directly for thegpt-4o model, ignoring thegpt-4o model defined above.model_name="gpt-4o" will use the[models.gpt-4o] model defined in thetensorzero.toml file, whereasmodel_name="openai::gpt-4o" will call the OpenAI API directly for thegpt-4o model.output_schemaoutput_schema defined in the function configuration for a JSON function.This dynamic output schema is used for validating the output of the function, and sent to providers which support structured outputs.otlp_traces_extra_headers{})tensorzero-otlp-traces-extra-header- before being sent to the OTLP endpoint.These headers are merged with any static headers configured inexport.otlp.traces.extra_headers.When the same header key is present in both static and dynamic headers, the dynamic header value takes precedence.SeeExport OpenTelemetry traces for more details and examples.parallel_tool_callstrue, the function will be allowed to request multiple tool calls in a single conversation turn.If not set, we default to the configuration value for the function being called.Most model providers do not support parallel tool calls. In those cases, the gateway ignores this field.At the moment, only Fireworks AI and OpenAI support parallel tool calls.params{}){ variant_type: { param: value, ... }, ... }.You should prefer to set these parameters in the configuration file if possible.Only use this field if you need to set these parameters dynamically at runtime.Note that the parameters will apply to every variant of the specified type.Currently, we support the following:chat_completionfrequency_penaltyjson_modemax_tokenspresence_penaltyreasoning_effortseedservice_tierstop_sequencestemperaturethinking_budget_tokenstop_pverbosityExample
temperature parameter for achat_completion variants, you’d include the following in the request body:{ // ... "params": { "chat_completion": { "temperature":0.7 } } // ...}provider_tools[])scope (object, optional): Limits which model/provider combination can use this tool. If omitted, the tool is available to all compatible providers.model_name (string): The model name as defined in your configurationmodel_provider_name (string): The provider name for that modeltool (object, required): The provider-specific tool configuration as defined by the provider’s APIExample: OpenAI Web Search (Unscoped)
{ "function_name":"my_function", "input": { "messages": [ { "role":"user", "content":"What were the latest developments in AI this week?" } ] }, "provider_tools": [ { "tool": { "type":"web_search" } } ]}Example: OpenAI Web Search (Scoped)
{ "function_name":"my_function", "input": { "messages": [ { "role":"user", "content":"What were the latest developments in AI this week?" } ] }, "provider_tools": [ { "scope": { "model_name":"gpt-5-mini", "model_provider_name":"openai" }, "tool": { "type":"web_search" } } ]}gpt-5-mini model.streamtrue, the gateway will stream the response from the model provider.tags{"user_id": "123"} or{"author": "Alice"}.tool_choicenone: The function should not use any tools.auto: The model decides whether or not to use a tool. If it decides to use a tool, it also decides which tools to use.required: The model should use a tool. If multiple tools are available, the model decides which tool to use.{ specific = "tool_name" }: The model should use a specific tool. The tool must be defined in thetools section of the configuration file or provided inadditional_tools.variant_namechat, the response is structured as follows.contenttype equal totext andtool_call.Reasoning models (e.g. DeepSeek R1) might also includethought content blocks.Iftype istext, the content block has the following fields:text: The text for the content block.type istool_call, the content block has the following fields:arguments (object): The validated arguments for the tool call (null if invalid).id (string): The ID of the content block.name (string): The validated name of the tool (null if invalid).raw_arguments (string): The arguments for the tool call generated by the model (which might be invalid).raw_name (string): The name of the tool generated by the model (which might be invalid).type isthought, the content block has the following fields:text (string): The text of the thought.unknown with the following additional fields:data: The original content block from the provider, without any validation or transformation by TensorZero.model_provider_name: The fully-qualified name of the model provider that returned the content block.your_model_provider_name returns a content block of typedaydreaming, it will be included in the response like this:{ "type":"unknown", "data": { "type":"daydreaming", "dream":"..." }, "model_provider_name":"tensorzero::model_name::your_model_name::provider_name::your_model_provider_name"}episode_idinference_idoriginal_responseinclude_original_response istrue).The returned data depends on the variant type:chat_completion: raw response from the inference to themodelexperimental_best_of_n_sampling: raw response from the inference to theevaluatorexperimental_mixture_of_n_sampling: raw response from the inference to thefuserexperimental_dynamic_in_context_learning: raw response from the inference to themodelexperimental_chain_of_thought: raw response from the inference to themodelvariant_nameusageinput_tokens: The number of input tokens used for the inference.output_tokens: The number of output tokens used for the inference.[DONE] message.Each JSON message has the following fields:contenttype equal totext ortool_call.Reasoning models (e.g. DeepSeek R1) might also includethought content block chunks.Iftype istext, the chunk has the following fields:id: The ID of the content block.text: The text delta for the content block.type istool_call, the chunk has the following fields (all strings):id: The ID of the content block.raw_name: The string delta of the name of the tool.raw_arguments: The string delta of the arguments for the tool call.type isthought, the chunk has the following fields:id: The ID of the content block.text: The text delta for the thought.episode_idinference_idvariant_nameusageinput_tokens: The number of input tokens used for the inference.output_tokens: The number of output tokens used for the inference.json, the response is structured as follows.inference_idepisode_idoriginal_responseinclude_original_response istrue).The returned data depends on the variant type:chat_completion: raw response from the inference to themodelexperimental_best_of_n_sampling: raw response from the inference to theevaluatorexperimental_mixture_of_n_sampling: raw response from the inference to thefuserexperimental_dynamic_in_context_learning: raw response from the inference to themodelexperimental_chain_of_thought: raw response from the inference to themodeloutputraw: The raw response from the model provider (which might be invalid JSON).parsed: The parsed response from the model provider (null if invalid JSON).variant_nameusageinput_tokens: The number of input tokens used for the inference.output_tokens: The number of output tokens used for the inference.[DONE] message.Each JSON message has the following fields:episode_idinference_idrawparsed field for streaming JSON inferences.If your application depends on a well-formed JSON response, we recommend using regular (non-streaming) inference.variant_nameusageinput_tokens: The number of input tokens used for the inference.output_tokens: The number of output tokens used for the inference.Chat Function
// tensorzero.toml# ...[functions.draft_email]type ="chat"# ...from tensorzeroimport AsyncTensorZeroGatewayasync with await AsyncTensorZeroGateway.build_http(gateway_url="http://localhost:3000")as client: result= await client.inference( function_name="draft_email", input={ "system":"You are an AI assistant...", "messages": [ { "role":"user", "content":"I need to write an email to Gabriel explaining..." } ] } # optional: stream=True, )curl -X POST http://localhost:3000/inference \ -H "Content-Type: application/json" \ -d '{ "function_name": "draft_email", "input": { "system": "You are an AI assistant...", "messages": [ { "role": "user", "content": "I need to write an email to Gabriel explaining..." } ] } // optional: "stream": true }'{ "inference_id":"00000000-0000-0000-0000-000000000000", "episode_id":"11111111-1111-1111-1111-111111111111", "variant_name":"prompt_v1", "content": [ { "type":"text", "text":"Hi Gabriel,\n\nI noticed...", } ] "usage": { "input_tokens":100, "output_tokens":100 }}[DONE] message.Each JSON message has the following fields:{ "inference_id":"00000000-0000-0000-0000-000000000000", "episode_id":"11111111-1111-1111-1111-111111111111", "variant_name":"prompt_v1", "content": [ { "type":"text", "id":"0", "text":"Hi Gabriel," // a text content delta } ], "usage": { "input_tokens":100, "output_tokens":100 }}Chat Function with Schemas
// tensorzero.toml# ...[functions.draft_email]type ="chat"system_schema ="system_schema.json"user_schema ="user_schema.json"# ...// system_schema.json{ "$schema":"http://json-schema.org/draft-07/schema#", "type":"object", "properties": { "tone": { "type":"string" } }, "required": ["tone"], "additionalProperties":false}// user_schema.json{ "$schema":"http://json-schema.org/draft-07/schema#", "type":"object", "properties": { "recipient": { "type":"string" }, "email_purpose": { "type":"string" } }, "required": ["recipient","email_purpose"], "additionalProperties":false}from tensorzeroimport AsyncTensorZeroGatewayasync with await AsyncTensorZeroGateway.build_http(gateway_url="http://localhost:3000")as client: result= await client.inference( function_name="draft_email", input={ "system": {"tone":"casual"}, "messages": [ { "role":"user", "content": [ { "type":"text", "arguments": { "recipient":"Gabriel", "email_purpose":"Request a meeting to..." } } ] } ] } # optional: stream=True, )curl -X POST http://localhost:3000/inference \ -H "Content-Type: application/json" \ -d '{ "function_name": "draft_email", "input": { "system": {"tone": "casual"}, "messages": [ { "role": "user", "content": [ { "type": "text", "arguments": { "recipient": "Gabriel", "email_purpose": "Request a meeting to..." } } ] } ] } // optional: "stream": true }'{ "inference_id":"00000000-0000-0000-0000-000000000000", "episode_id":"11111111-1111-1111-1111-111111111111", "variant_name":"prompt_v1", "content": [ { "type":"text", "text":"Hi Gabriel,\n\nI noticed...", } ] "usage": { "input_tokens":100, "output_tokens":100 }}[DONE] message.Each JSON message has the following fields:{ "inference_id":"00000000-0000-0000-0000-000000000000", "episode_id":"11111111-1111-1111-1111-111111111111", "variant_name":"prompt_v1", "content": [ { "type":"text", "id":"0", "text":"Hi Gabriel," // a text content delta } ], "usage": { "input_tokens":100, "output_tokens":100 }}Chat Function with Tool Use
// tensorzero.toml# ...[functions.weather_bot]type ="chat"tools = ["get_temperature"]# ...[tools.get_temperature]description ="Get the current temperature in a given location"parameters ="get_temperature.json"# ...// get_temperature.json{ "$schema":"http://json-schema.org/draft-07/schema#", "type":"object", "properties": { "location": { "type":"string", "description":"The location to get the temperature for (e.g.\"New York\")" }, "units": { "type":"string", "description":"The units to get the temperature in (must be\"fahrenheit\" or\"celsius\")", "enum": ["fahrenheit","celsius"] } }, "required": ["location"], "additionalProperties":false}from tensorzeroimport AsyncTensorZeroGatewayasync with await AsyncTensorZeroGateway.build_http(gateway_url="http://localhost:3000")as client: result= await client.inference( function_name="weather_bot", input={ "messages": [ { "role":"user", "content":"What is the weather like in Tokyo?" } ] } # optional: stream=True, )curl -X POST http://localhost:3000/inference \ -H "Content-Type: application/json" \ -d '{ "function_name": "weather_bot", "input": { "messages": [ { "role": "user", "content": "What is the weather like in Tokyo?" } ] } // optional: "stream": true }'{ "inference_id":"00000000-0000-0000-0000-000000000000", "episode_id":"11111111-1111-1111-1111-111111111111", "variant_name":"prompt_v1", "content": [ { "type":"tool_call", "arguments": { "location":"Tokyo", "units":"celsius" }, "id":"123456789", "name":"get_temperature", "raw_arguments":"{\"location\":\"Tokyo\",\"units\":\"celsius\"}", "raw_name":"get_temperature" } ], "usage": { "input_tokens":100, "output_tokens":100 }}[DONE] message.Each JSON message has the following fields:{ "inference_id":"00000000-0000-0000-0000-000000000000", "episode_id":"11111111-1111-1111-1111-111111111111", "variant_name":"prompt_v1", "content": [ { "type":"tool_call", "id":"123456789", "name":"get_temperature", "arguments":"{\"location\":" // a tool arguments delta } ], "usage": { "input_tokens":100, "output_tokens":100 }}Chat Function with Multi-Turn Tool Use
// tensorzero.toml# ...[functions.weather_bot]type ="chat"tools = ["get_temperature"]# ...[tools.get_temperature]description ="Get the current temperature in a given location"parameters ="get_temperature.json"# ...// get_temperature.json{ "$schema":"http://json-schema.org/draft-07/schema#", "type":"object", "properties": { "location": { "type":"string", "description":"The location to get the temperature for (e.g.\"New York\")" }, "units": { "type":"string", "description":"The units to get the temperature in (must be\"fahrenheit\" or\"celsius\")", "enum": ["fahrenheit","celsius"] } }, "required": ["location"], "additionalProperties":false}from tensorzeroimport AsyncTensorZeroGatewayasync with await AsyncTensorZeroGateway.build_http(gateway_url="http://localhost:3000")as client: result= await client.inference( function_name="weather_bot", input={ "messages": [ { "role":"user", "content":"What is the weather like in Tokyo?" }, { "role":"assistant", "content": [ { "type":"tool_call", "arguments": { "location":"Tokyo", "units":"celsius" }, "id":"123456789", "name":"get_temperature", } ] }, { "role":"user", "content": [ { "type":"tool_result", "id":"123456789", "name":"get_temperature", "result":"25" # the tool result must be a string } ] } ] } # optional: stream=True, )curl -X POST http://localhost:3000/inference \ -H "Content-Type: application/json" \ -d '{ "function_name": "weather_bot", "input": { "messages": [ { "role": "user", "content": "What is the weather like in Tokyo?" }, { "role": "assistant", "content": [ { "type": "tool_call", "arguments": { "location": "Tokyo", "units": "celsius" }, "id": "123456789", "name": "get_temperature", } ] }, { "role": "user", "content": [ { "type": "tool_result", "id": "123456789", "name": "get_temperature", "result": "25" // the tool result must be a string } ] } ] } // optional: "stream": true }'{ "inference_id":"00000000-0000-0000-0000-000000000000", "episode_id":"11111111-1111-1111-1111-111111111111", "variant_name":"prompt_v1", "content": [ { "type":"text", "content": [ { "type":"text", "text":"The weather in Tokyo is 25 degrees Celsius." } ] } ], "usage": { "input_tokens":100, "output_tokens":100 }}[DONE] message.Each JSON message has the following fields:{ "inference_id":"00000000-0000-0000-0000-000000000000", "episode_id":"11111111-1111-1111-1111-111111111111", "variant_name":"prompt_v1", "content": [ { "type":"text", "id":"0", "text":"The weather in" // a text content delta } ], "usage": { "input_tokens":100, "output_tokens":100 }}Chat Function with Dynamic Tool Use
// tensorzero.toml# ...[functions.weather_bot]type ="chat"# Note: no `tools = ["get_temperature"]` field in configuration# ...from tensorzeroimport AsyncTensorZeroGatewayasync with await AsyncTensorZeroGateway.build_http(gateway_url="http://localhost:3000")as client: result= await client.inference( function_name="weather_bot", input={ "messages": [ { "role":"user", "content":"What is the weather like in Tokyo?" } ] }, additional_tools=[ { "name":"get_temperature", "description":"Get the current temperature in a given location", "parameters": { "$schema":"http://json-schema.org/draft-07/schema#", "type":"object", "properties": { "location": { "type":"string", "description":"The location to get the temperature for (e.g.\"New York\")" }, "units": { "type":"string", "description":"The units to get the temperature in (must be\"fahrenheit\" or\"celsius\")", "enum": ["fahrenheit","celsius"] } }, "required": ["location"], "additionalProperties": false } } ], # optional: stream=True, )curl -X POST http://localhost:3000/inference \ -H "Content-Type: application/json" \ -d '{ "function_name": "weather_bot", input: { "messages": [ { "role": "user", "content": "What is the weather like in Tokyo?" } ] }, additional_tools: [ { "name": "get_temperature", "description": "Get the current temperature in a given location", "parameters": { "$schema": "http://json-schema.org/draft-07/schema#", "type": "object", "properties": { "location": { "type": "string", "description": "The location to get the temperature for (e.g. \"New York\")" }, "units": { "type": "string", "description": "The units to get the temperature in (must be \"fahrenheit\" or \"celsius\")", "enum": ["fahrenheit", "celsius"] } }, "required": ["location"], "additionalProperties": false } } ] // optional: "stream": true }'{ "inference_id":"00000000-0000-0000-0000-000000000000", "episode_id":"11111111-1111-1111-1111-111111111111", "variant_name":"prompt_v1", "content": [ { "type":"tool_call", "arguments": { "location":"Tokyo", "units":"celsius" }, "id":"123456789", "name":"get_temperature", "raw_arguments":"{\"location\":\"Tokyo\",\"units\":\"celsius\"}", "raw_name":"get_temperature" } ], "usage": { "input_tokens":100, "output_tokens":100 }}[DONE] message.Each JSON message has the following fields:{ "inference_id":"00000000-0000-0000-0000-000000000000", "episode_id":"11111111-1111-1111-1111-111111111111", "variant_name":"prompt_v1", "content": [ { "type":"tool_call", "id":"123456789", "name":"get_temperature", "arguments":"{\"location\":" // a tool arguments delta } ], "usage": { "input_tokens":100, "output_tokens":100 }}Chat Function with Dynamic Inference Parameters
// tensorzero.toml# ...[functions.draft_email]type ="chat"# ...[functions.draft_email.variants.prompt_v1]type ="chat_completion"temperature =0.5 # the API request will override this value# ...from tensorzeroimport AsyncTensorZeroGatewayasync with await AsyncTensorZeroGateway.build_http(gateway_url="http://localhost:3000")as client: result= await client.inference( function_name="draft_email", input={ "system":"You are an AI assistant...", "messages": [ { "role":"user", "content":"I need to write an email to Gabriel explaining..." } ] }, # Override parameters for every variant with type "chat_completion" params={ "chat_completion": { "temperature":0.7, } }, # optional: stream=True, )curl -X POST http://localhost:3000/inference \ -H "Content-Type: application/json" \ -d '{ "function_name": "draft_email", "input": { "system": "You are an AI assistant...", "messages": [ { "role": "user", "content": "I need to write an email to Gabriel explaining..." } ] }, params={ // Override parameters for every variant with type "chat_completion" "chat_completion": { "temperature": 0.7, } } // optional: "stream": true }'{ "inference_id":"00000000-0000-0000-0000-000000000000", "episode_id":"11111111-1111-1111-1111-111111111111", "variant_name":"prompt_v1", "content": [ { "type":"text", "text":"Hi Gabriel,\n\nI noticed...", } ] "usage": { "input_tokens":100, "output_tokens":100 }}[DONE] message.Each JSON message has the following fields:{ "inference_id":"00000000-0000-0000-0000-000000000000", "episode_id":"11111111-1111-1111-1111-111111111111", "variant_name":"prompt_v1", "content": [ { "id":"0", "text":"Hi Gabriel," // a text content delta } ], "usage": { "input_tokens":100, "output_tokens":100 }}JSON Function
// tensorzero.toml# ...[functions.extract_email]type ="json"output_schema ="output_schema.json"# ...// output_schema.json{ "$schema":"http://json-schema.org/draft-07/schema#", "type":"object", "properties": { "email": { "type":"string" } }, "required": ["email"]}from tensorzeroimport AsyncTensorZeroGatewayasync with await AsyncTensorZeroGateway.build_http(gateway_url="http://localhost:3000")as client: result= await client.inference( function_name="extract_email", input={ "system":"You are an AI assistant...", "messages": [ { "role":"user", "content":"...blah blah blah[email protected] blah blah blah..." } ] } # optional: stream=True, )curl -X POST http://localhost:3000/inference \ -H "Content-Type: application/json" \ -d '{ "function_name": "extract_email", "input": { "system": "You are an AI assistant...", "messages": [ { "role": "user", "content": "...blah blah blah[email protected] blah blah blah..." } ] } // optional: "stream": true }'{ "inference_id":"00000000-0000-0000-0000-000000000000", "episode_id":"11111111-1111-1111-1111-111111111111", "variant_name":"prompt_v1", "output": { "raw":"{\"email\":\"[email protected]\"}", "parsed": { "email":"[email protected]" } } "usage": { "input_tokens":100, "output_tokens":100 }}[DONE] message.Each JSON message has the following fields:{ "inference_id":"00000000-0000-0000-0000-000000000000", "episode_id":"11111111-1111-1111-1111-111111111111", "variant_name":"prompt_v1", "raw":"{\"email\":",// a JSON content delta "usage": { "input_tokens":100, "output_tokens":100 }}