API reference for the/openai/v1/chat/completions endpoint.
POST /openai/v1/chat/completions/openai/v1/chat/completions endpoint allows TensorZero users to make TensorZero inferences with the OpenAI client.The gateway translates the OpenAI request parameters into the arguments expected by theinference endpoint and calls the same underlying implementation.This endpoint supports most of the features supported by theinference endpoint, but there are some limitations.Most notably, this endpoint doesn’t support dynamic credentials, so they must be specified with a different method.POST /inference for more details on inference with the native TensorZero API.inference endpoint.TensorZero-specific parameters are prefixed withtensorzero:: (e.g.tensorzero::episode_id).These fields should be provided as extra body parameters in the request body.tensorzero.toml file.In most cases, these credentials will be environment variables available to the TensorZero gateway —not your OpenAI client.API keys sent from the OpenAI client will be ignored.tensorzero::cache_optionsenabled (string): The cache mode. Can be one of:"write_only" (default): Only write to cache but don’t serve cached responses"read_only": Only read from cache but don’t write new entries"on": Both read from and write to cache"off": Disable caching completelymax_age_s (integer or null): Maximum age in seconds for cache entries to be considered valid when reading from cache. Does not set a TTL for cache expiration. Default isnull (no age limit).extra_body.See theInference Caching guide for more details.tensorzero::credentialsdynamic location (e.g.dynamic::my_dynamic_api_key_name).See theconfiguration reference for more details.The gateway expects the credentials to be provided in thecredentials field of the request body as specified below.The gateway will return a 400 error if the credentials are not provided and the model provider has been configured with dynamic credentials.Example
[models.my_model_name.providers.my_provider_name]# ...# Note: the name of the credential field (e.g. `api_key_location`) depends on the provider typeapi_key_location ="dynamic::my_dynamic_api_key_name"# ...{ // ... "tensorzero::credentials": { // ... "my_dynamic_api_key_name":"sk-..." // ... } // ...}tensorzero::deny_unknown_fieldsfalse)true, the gateway will return an error if the request contains any unknown or unrecognized fields.By default, unknown fields are ignored with a warning logged.This field does not affect thetensorzero::extra_body field, only unknown fields at the root of the request body.This field should be provided as an extra body parameter in the request body.response= oai.chat.completions.create( model="tensorzero::model_name::openai::gpt-5-mini", messages=[ { "role":"user", "content":"Tell me a fun fact.", } ], extra_body={ "tensorzero::deny_unknown_fields":True, }, ultrathink=True,# made-up parameter → `deny_unknown_fields` would reject this request)tensorzero::dryruntrue, the inference request will be executed but won’t be stored to the database.The gateway will still call the downstream model providers.This field is primarily for debugging and testing, and you should generally not use it in production.This field should be provided as an extra body parameter in the request body.tensorzero::episode_idtensorzero::extra_bodytensorzero::extra_body field allows you to modify the request body that TensorZero sends to a model provider.This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.extra_body field, it will override the request from the client to the gateway.If you usetensorzero::extra_body, it will override the request from the gateway to the model provider.pointer: AJSON Pointer string specifying where to modify the request bodyvalue: The value to insert at that location; it can be of any type including nested typesdelete = true: Deletes the field at the specified location, if present.variant_namemodel_namemodel_name andprovider_nameextra_body in the configuration file.The values provided at inference-time take priority over the values in the configuration file.Example
{ "project":"tensorzero", "safety_checks": { "no_internet":false, "no_agi":true }}extra_body in the inference request…{ // ... "tensorzero::extra_body": [ { "variant_name":"my_variant",// or "model_name": "my_model", "provider_name": "my_provider" "pointer":"/agi", "value":true }, { // No `variant_name` or `model_name`/`provider_name` specified, so it applies to all variants and providers "pointer":"/safety_checks/no_agi", "value": { "bypass":"on" } } ]}{ "agi":true, "project":"tensorzero", "safety_checks": { "no_internet":false, "no_agi": { "bypass":"on" } }}tensorzero::extra_headerstensorzero::extra_headers field allows you to modify the request headers that TensorZero sends to a model provider.This advanced feature is an “escape hatch” that lets you use provider-specific functionality that TensorZero hasn’t implemented yet.extra_headers field, it will override the request from the client to the gateway.If you usetensorzero::extra_headers, it will override the request from the gateway to the model provider.name: The name of the header to modifyvalue: The value to set the header tovariant_namemodel_namemodel_name andprovider_nameextra_headers in the configuration file.The values provided at inference-time take priority over the values in the configuration file.Example
Safety-Checks: onextra_headers…{ "extra_headers": [ { "variant_name":"my_variant",// or "model_name": "my_model", "provider_name": "my_provider" "name":"Safety-Checks", "value":"off" }, { // No `variant_name` or `model_name`/`provider_name` specified, so it applies to all variants and providers "name":"Intelligence-Level", "value":"AGI" } ]}Safety-Checks is set tooff only formy_variant, whileIntelligence-Level: AGI is applied globally to all variants and providers:Safety-Checks: offIntelligence-Level: AGItensorzero::paramschat_completion field containing any of the following parameters:frequency_penalty (float): Penalizes tokens based on their frequencyjson_mode (object): Controls JSON output formattingmax_tokens (integer): Maximum number of tokens to generatepresence_penalty (float): Penalizes tokens based on their presencereasoning_effort (string): Effort level for reasoning modelsseed (integer): Random seed for deterministic outputsservice_tier (string): Service tier for the requeststop_sequences (list of strings): Sequences that stop generationtemperature (float): Controls randomness in the outputthinking_budget_tokens (integer): Token budget for thinking/reasoningtop_p (float): Nucleus sampling parameterverbosity (string): Output verbosity leveltensorzero::params take precedence over parameters provided directly in the request body (e.g., top-leveltemperature,max_tokens) or inferred from other fields (e.g.,json_mode inferred fromresponse_format).Example
from openaiimport OpenAIclient= OpenAI( base_url="http://localhost:3000/openai/v1", api_key="your_api_key",)response= client.chat.completions.create( model="tensorzero::function_name::my_function", messages=[ {"role":"user","content":"Explain quantum computing"} ], extra_body={ "tensorzero::params": { "chat_completion": { "temperature":0.7, "max_tokens":500, "reasoning_effort":"high" } } })tensorzero::provider_tools[])scope (object, optional): Limits which model/provider combination can use this tool. If omitted, the tool is available to all compatible providers.model_name (string): The model name as defined in your configurationmodel_provider_name (string): The provider name for that modeltool (object, required): The provider-specific tool configuration as defined by the provider’s APIextra_body.This field allows for dynamic provider tool configuration at runtime.You should prefer to define provider tools in the configuration file if possible (seeConfiguration Reference).Only use this field if dynamic provider tool configuration is necessary for your use case.Example: OpenAI Web Search (Unscoped)
from openaiimport OpenAIclient= OpenAI( base_url="http://localhost:3000/openai/v1", api_key="your_api_key",)response= client.chat.completions.create( model="tensorzero::function_name::my_function", messages=[ {"role":"user","content":"What were the latest developments in AI this week?"} ], extra_body={ "tensorzero::provider_tools": [ { "tool": { "type":"web_search" } } ] })Example: OpenAI Web Search (Scoped)
from openaiimport OpenAIclient= OpenAI( base_url="http://localhost:3000/openai/v1", api_key="your_api_key",)response= client.chat.completions.create( model="tensorzero::function_name::my_function", messages=[ {"role":"user","content":"What were the latest developments in AI this week?"} ], extra_body={ "tensorzero::provider_tools": [ { "scope": { "model_name":"gpt-5-mini", "model_provider_name":"openai" }, "tool": { "type":"web_search" } } ] })gpt-5-mini model.tensorzero::tags{"user_id": "123"} or{"author": "Alice"}.frequency_penaltynull)frequency_penalty setting for any chat completion variants being used.max_completion_tokensnull)max_tokens are set, the smaller value is used.max_tokensnull)max_completion_tokens are set, the smaller value is used.messagesrole (required): The role of the message sender in an OpenAI message (assistant,system,tool, oruser).content (required foruser andsystem messages and optional forassistant andtool messages): The content of the message.The content must be either a string or an array of content blocks (see below).tool_calls (optional forassistant messages, otherwise disallowed): A list of tool calls. Each tool call is an object with the following fields:id: A unique identifier for the tool calltype: The type of tool being called (currently only"function" is supported)function: An object containing:name: The name of the function to callarguments: A JSON string containing the function argumentstool_call_id (required fortool messages, otherwise disallowed): The ID of the tool call to associate with the message. This should be one that was originally returned by the gateway in a tool callid field.text,image_url, or TensorZero-specific types.If the content block has typetext, it must have either of the following additional fields:text: The text for the content block.tensorzero::arguments: A JSON object containing the function arguments for TensorZero functions with templates and schemas (seeCreate a prompt template for details).image_url, it must have the following additional fields:"image_url": A JSON object with the following fields:url: The URL for a remote image (e.g."https://example.com/image.png") or base64-encoded data for an embedded image (e.g."data:image/png;base64,...").detail (optional): Controls the fidelity of image processing. Only applies to image files; ignored for other file types. Can below,high, orauto. Affects token consumption and image quality.input_audio, it must have the following additional field:input_audio: An object containing:data: Base64-encoded audio data (without adata: prefix or MIME type header).format: The audio format as a string (e.g.,"mp3","wav"). Note: The MIME type is detected from the actual audio bytes, and a warning is logged if the detected type differs from this field.tensorzero::raw_text: Bypasses templates and schemas, sending text directly to the model. Useful for testing prompts or dynamic injection without configuration changes. Must have avalue field containing the text.tensorzero::template: Explicitly specify a template to use. Must havename andarguments fields.model| To call… | Use this format… |
A function defined as[functions.my_function] in yourtensorzero.toml configuration file | tensorzero::function_name::my_function |
A model defined as[models.my_model] in yourtensorzero.tomlconfiguration file | tensorzero::model_name::my_model |
A model offered by a model provider, without defining it in yourtensorzero.toml configuration file (if supported, see below) | tensorzero::model_name::{provider_type}::{model_name} |
anthropic,deepseek,fireworks,gcp_vertex_anthropic,gcp_vertex_gemini,google_ai_studio_gemini,groq,hyperbolic,mistral,openai,openrouter,together, andxai.[models.gpt-4o]routing = ["openai","azure"][models.gpt-4o.providers.openai]# ...[models.gpt-4o.providers.azure]# ...[functions.extract-data]# ...tensorzero::function_name::extract-data calls theextract-data function defined above.tensorzero::model_name::gpt-4o calls thegpt-4o model in your configuration, which supports fallback fromopenai toazure. SeeRetries & Fallbacks for details.tensorzero::model_name::openai::gpt-4o calls the OpenAI API directly for thegpt-4o model, ignoring thegpt-4o model defined above.tensorzero::model_name::gpt-4o will use the[models.gpt-4o] model defined in thetensorzero.toml file, whereastensorzero::model_name::openai::gpt-4o will call the OpenAI API directly for thegpt-4o model.parallel_tool_callsnull)parallel_tool_calls setting for the function being called.presence_penaltynull)presence_penalty setting for any chat completion variants being used.response_formatnull)"text","json_object", and"{"type": "json_schema", "schema": ...}", where the schema field contains a valid JSON schema.This field is not actually respected except for the"json_schema" variant, in which theschema field can be used to dynamically set the output schema for ajson function.seednull)seed setting for any chat completion variants being used.stop_sequencesnull)stop_sequences setting for any chat completion variants being used.streamfalse)stream_options"include_usage"null)"include_usage" istrue, the gateway will include usage information in the response.Example
stream_options is provided…{ ... "stream_options": { "include_usage":true } ...}{ ... "usage": { "prompt_tokens":123, "completion_tokens":456, "total_tokens":579 } ...temperaturenull)temperature setting for any chat completion variants being used.toolstool objects (see below)null)type: Must be"function"function: An object containing:name: The name of the function (string, required)description: A description of what the function does (string, optional)parameters: A JSON Schema object describing the function’s parameters (required)strict: Whether to enforce strict schema validation (boolean, defaults to false)type: Must be"custom"custom: An object containing:name: The name of the tool (string, required)description: A description of what the tool does (string, optional)format: The output format for the tool (object, optional):{"type": "text"}: Freeform text output{"type": "grammar", "grammar": {"syntax": "lark", "definition": "..."}}: Output constrained by aLark grammar{"type": "grammar", "grammar": {"syntax": "regex", "definition": "..."}}: Output constrained by a regular expressionExample: OpenAI Custom Tool
from openaiimport OpenAIclient= OpenAI( base_url="http://localhost:3000/openai/v1", api_key="your_api_key",)response= client.chat.completions.create( model="tensorzero::model_name::openai::gpt-5-mini", messages=[ {"role":"user","content":"Generate Python code to print 'Hello, World!'"} ], tools=[ { "type":"custom", "custom": { "name":"code_generator", "description":"Generates Python code snippets", "format": {"type":"text"} } } ],)tool_choice"none" if no tools are present,"auto" if tools are present)"none": The model will not call any tool and instead generates a message"auto": The model can pick between generating a message or calling one or more tools"required": The model must call one or more tools{"type": "function", "function": {"name": "my_function"}}: Forces the model to call the specified tool{"type": "allowed_tools", "allowed_tools": {"tools": [...], "mode": "auto"|"required"}}: Restricts which tools can be calledtop_pnull)top_p setting for any chat completion variants being used.tensorzero::variant_namechoiceschoice objects, where each choice contains:index: A zero-based index indicating the choice’s position in the list (integer)finish_reason: Always"stop".message: An object containing:content: The message content (string, optional)tool_calls: List of tool calls made by the model (optional). The format is the same as in the request.role: The role of the message sender (always"assistant").createdepisode_ididmodelobject"chat.completion").system_fingerprintusageprompt_tokens: Number of tokens in the prompt (integer)completion_tokens: Number of tokens in the completion (integer)total_tokens: Total number of tokens used (integer)[DONE] message.Each JSON message has the following fields:choicesindex: The index of the choice (integer)finish_reason: always ""delta: An object containing either:content: The next piece of generated text (string), ortool_calls: A list of tool calls, each containing the next piece of the tool call being generatedcreatedepisode_ididmodelobject"chat.completion").system_fingerprintusageprompt_tokens: Number of tokens in the prompt (integer)completion_tokens: Number of tokens in the completion (integer)total_tokens: Total number of tokens used (integer)Chat Function with Structured System Prompt
// tensorzero.toml# ...[functions.draft_email]type ="chat"system_schema ="functions/draft_email/system_schema.json"# ...// functions/draft_email/system_schema.json{ "type":"object", "properties": { "assistant_name": {"type":"string" } }}from openaiimport AsyncOpenAIasync with AsyncOpenAI( base_url="http://localhost:3000/openai/v1")as client: result= await client.chat.completions.create( # there already was an episode_id from an earlier inference extra_body={"tensorzero::episode_id":str(episode_id)}, messages=[ { "role":"system", "content": [{"assistant_name":"Alfred Pennyworth"}] #NOTE: the JSON is in an array here so that a structured system message can be sent }, { "role":"user", "content":"I need to write an email to Gabriel explaining..." } ], model="tensorzero::function_name::draft_email", temperature=0.4, # Optional: stream=True )curl -X POST http://localhost:3000/openai/v1/chat/completions \ -H "Content-Type: application/json" \ -H "episode_id: your_episode_id_here" \ -d '{ "messages": [ { "role": "system", "content": [{"assistant_name": "Alfred Pennyworth"}] }, { "role": "user", "content": "I need to write an email to Gabriel explaining..." } ], "model": "tensorzero::function_name::draft_email", "temperature": 0.4 // Optional: "stream": true }'{ "id":"00000000-0000-0000-0000-000000000000", "episode_id":"11111111-1111-1111-1111-111111111111", "model":"email_draft_variant", "choices": [ { "index":0, "finish_reason":"stop", "message": { "content":"Hi Gabriel,\n\nI noticed...", "role":"assistant" } } ], "usage": { "prompt_tokens":100, "completion_tokens":100, "total_tokens":200 }}[DONE] message.Each JSON message has the following fields:{ "id":"00000000-0000-0000-0000-000000000000", "episode_id":"11111111-1111-1111-1111-111111111111", "model":"email_draft_variant", "choices": [ { "index":0, "finish_reason":"stop", "delta": { "content":"Hi Gabriel,\n\nI noticed..." } } ], "usage": { "prompt_tokens":100, "completion_tokens":100, "total_tokens":200 }}Chat Function with Dynamic Tool Use
// tensorzero.toml# ...[functions.weather_bot]type ="chat"# Note: no `tools = ["get_temperature"]` field in configuration# ...from openaiimport AsyncOpenAIasync with AsyncOpenAI( base_url="http://localhost:3000/openai/v1")as client: result= await client.chat.completions.create( model="tensorzero::function_name::weather_bot", input={ "messages": [ { "role":"user", "content":"What is the weather like in Tokyo?" } ] }, tools=[ { "type":"function", "function": { "name":"get_temperature", "description":"Get the current temperature in a given location", "parameters": { "$schema":"http://json-schema.org/draft-07/schema#", "type":"object", "properties": { "location": { "type":"string", "description":"The location to get the temperature for (e.g.\"New York\")" }, "units": { "type":"string", "description":"The units to get the temperature in (must be\"fahrenheit\" or\"celsius\")", "enum": ["fahrenheit","celsius"] } }, "required": ["location"], "additionalProperties": false } } } ], # optional: stream=True, )curl -X POST http://localhost:3000/openai/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "tensorzero::function_name::weather_bot", "input": { "messages": [ { "role": "user", "content": "What is the weather like in Tokyo?" } ] }, "tools": [ { "type": "function", "function": { "name": "get_temperature", "description": "Get the current temperature in a given location", "parameters": { "$schema": "http://json-schema.org/draft-07/schema#", "type": "object", "properties": { "location": { "type": "string", "description": "The location to get the temperature for (e.g. \"New York\")" }, "units": { "type": "string", "description": "The units to get the temperature in (must be \"fahrenheit\" or \"celsius\")", "enum": ["fahrenheit", "celsius"] } }, "required": ["location"], "additionalProperties": false } } } ] // optional: "stream": true }'{ "id":"00000000-0000-0000-0000-000000000000", "episode_id":"11111111-1111-1111-1111-111111111111", "model":"weather_bot_variant", "choices": [ { "index":0, "finish_reason":"stop", "message": { "content":null, "tool_calls": [ { "id":"123456789", "type":"function", "function": { "name":"get_temperature", "arguments":"{\"location\":\"Tokyo\",\"units\":\"celsius\"}" } } ], "role":"assistant" } } ], "usage": { "prompt_tokens":100, "completion_tokens":100, "total_tokens":200 }}[DONE] message.Each JSON message has the following fields:{ "id":"00000000-0000-0000-0000-000000000000", "episode_id":"11111111-1111-1111-1111-111111111111", "model":"weather_bot_variant", "choices": [ { "index":0, "finish_reason":"stop", "message": { "content":null, "tool_calls": [ { "id":"123456789", "type":"function", "function": { "name":"get_temperature", "arguments":"{\"location\":" // a tool arguments delta } } ] } } ], "usage": { "prompt_tokens":100, "completion_tokens":100, "total_tokens":200 }}JSON Function with Dynamic Output Schema
// tensorzero.toml# ...[functions.extract_email]type ="json"output_schema ="output_schema.json"# ...// output_schema.json{ "$schema":"http://json-schema.org/draft-07/schema#", "type":"object", "properties": { "email": { "type":"string" } }, "required": ["email"]}from openaiimport AsyncOpenAIdynamic_output_schema= { "$schema":"http://json-schema.org/draft-07/schema#", "type":"object", "properties": { "email": {"type":"string" }, "domain": {"type":"string" } }, "required": ["email","domain"]}async with AsyncOpenAI( base_url="http://localhost:3000/openai/v1")as client: result= await client.chat.completions.create( model="tensorzero::function_name::extract_email", input={ "system":"You are an AI assistant...", "messages": [ { "role":"user", "content":"...blah blah blah[email protected] blah blah blah..." } ] } # Override the output schema using the `response_format` field response_format={"type":"json_schema","schema": dynamic_output_schema} # optional: stream=True, )curl -X POST http://localhost:3000/openai/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "tensorzero::function_name::extract_email", "input": { "system": "You are an AI assistant...", "messages": [ { "role": "user", "content": "...blah blah blah[email protected] blah blah blah..." } ] }, "response_format": { "type": "json_schema", "schema": { "$schema": "http://json-schema.org/draft-07/schema#", "type": "object", "properties": { "email": { "type": "string" }, "domain": { "type": "string" } }, "required": ["email", "domain"] } }, // optional: "stream": true }'{ "id":"00000000-0000-0000-0000-000000000000", "episode_id":"11111111-1111-1111-1111-111111111111", "model":"extract_email_variant", "choices": [ { "index":0, "finish_reason":"stop", "message": { "content":"{\"email\":\"[email protected]\",\"domain\":\"tensorzero.com\"}" } } ], "usage": { "prompt_tokens":100, "completion_tokens":100, "total_tokens":200 }}[DONE] message.Each JSON message has the following fields:{ "id":"00000000-0000-0000-0000-000000000000", "episode_id":"11111111-1111-1111-1111-111111111111", "model":"extract_email_variant", "choices": [ { "index":0, "finish_reason":"stop", "message": { "content":"{\"email\":" // a JSON content delta } } ], "usage": { "prompt_tokens":100, "completion_tokens":100, "total_tokens":200 }}