API reference for the Batch Inference endpoints.
/batch_inference endpoints allow users to take advantage of batched inference offered by LLM providers.These inferences are often substantially cheaper than the synchronous APIs.The handling and eventual data model for inferences made through this endpoint are equivalent to those made through the main/inference endpoint with a few exceptions:chat_completion are supported.dryrun setting is not supported.POST /batch_inference endpoint to submit a batch of requests.Later, you can poll theGET /batch_inference/{batch_id} orGET /batch_inference/:batch_id/inference/:inference_id endpoint to check the status of the batch and retrieve results.Each poll will return either a pending or failed status or the results of the batch.Even after a batch has completed and been processed, you can continue to poll the endpoint as a way of retrieving the results.The first time a batch has completed and been processed, the results are stored in the ChatInference, JsonInference, and ModelInference tables as with the/inference endpoint.The gateway will rehydrate the results into the expected result when polled repeatedly after finishingPOST /batch_inferenceadditional_toolsdescription,name,parameters, andstrict.The fields are identical to those in the configuration file, except that theparameters field should contain the JSON schema itself rather than a path to it.SeeConfiguration Reference for more details.allowed_toolsadditional_tools.Each element in the outer list corresponds to a single inference in the batch.Each inner list contains the names of the tools that are allowed for the corresponding inference.Some providers (notably OpenAI) natively support restricting allowed tools.For these providers, we send all tools (both configured and dynamic) to the provider, and separately specify which ones are allowed to be called.For providers that do not natively support this feature, we filter the tool list ourselves and only send the allowed tools to the provider.credentialsdynamic location (e.g.dynamic::my_dynamic_api_key_name).See theconfiguration reference for more details.The gateway expects the credentials to be provided in thecredentials field of the request body as specified below.The gateway will return a 400 error if the credentials are not provided and the model provider has been configured with dynamic credentials.Example
[models.my_model_name.providers.my_provider_name]# ...# Note: the name of the credential field (e.g. `api_key_location`) depends on the provider typeapi_key_location ="dynamic::my_dynamic_api_key_name"# ...{ // ... "credentials": { // ... "my_dynamic_api_key_name":"sk-..." // ... } // ...}episode_idsnull for episode IDs for elements that should start a fresh episode.Only use episode IDs that were returned by the TensorZero gateway.function_nameinputsinput objects (see below)input[].messages[])role: The role of the message (assistant oruser).content: The content of the message (see below).content field can be have one of the following types:type and additional fields depending on the type.If the content block has typetext, it must have either of the following additional fields:text: The text for the content block.arguments: A JSON object containing the function arguments for TensorZero functions with templates and schemas (seeCreate a prompt template for details).tool_call, it must have the following additional fields:arguments: The arguments for the tool call.id: The ID for the content block.name: The name of the tool for the content block.tool_result, it must have the following additional fields:id: The ID for the content block.name: The name of the tool for the content block.result: The result of the tool call.file, it must have exactly one of the following additional fields:file_type: must beurlurlmime_type (optional): override the MIME type of the filefilename (optional): a filename to associate with the filefile_type: must bebase64data:base64-encoded data for an embedded filemime_type: the MIME type (e.g.image/png,image/jpeg,application/pdf)filename (optional): a filename to associate with the fileraw_text, it must have the following additional fields:value: The text for the content block.This content block will ignore any relevant templates and schemas for this function.thought, it must have the following additional fields:text: The text for the content block.unknown, it must have the following additional fields:data: The original content block from the provider, without any validation or transformation by TensorZero.model_provider_name (optional): A string specifying when this content block should be included in the model provider input.If set, the content block will only be provided to this specific model provider.If not set, the content block is passed to all model providers.daydreaming content block to inference requests targeting theyour_model_provider_name model provider.{ "type":"unknown", "data": { "type":"daydreaming", "dream":"..." }, "model_provider_name":"tensorzero::model_name::your_model_name::provider_name::your_model_provider_name"}Example
{ // ... "input": { "messages": [ // If you don't have a user (or assistant) schema... { "role":"user",// (or "assistant") "content":"What is the weather in Tokyo?" }, // If you have a user (or assistant) schema... { "role":"user",// (or "assistant") "content": [ { "type":"text", "arguments": { "location":"Tokyo" // ... } } ] }, // If the model previously called a tool... { "role":"assistant", "content": [ { "type":"tool_call", "id":"0", "name":"get_temperature", "arguments":"{\"location\":\"Tokyo\"}" } ] }, // ...and you're providing the result of that tool call... { "role":"user", "content": [ { "type":"tool_result", "id":"0", "name":"get_temperature", "result":"70" } ] }, // You can also specify a text message using a content block... { "role":"user", "content": [ { "type":"text", "text":"What about NYC?" // (or object if there is a schema) } ] }, // You can also provide multiple content blocks in a single message... { "role":"assistant", "content": [ { "type":"text", "text":"Sure, I can help you with that." // (or object if there is a schema) }, { "type":"tool_call", "id":"0", "name":"get_temperature", "arguments":"{\"location\":\"New York\"}" } ] } // ... ] // ... } // ...}input[].systemoutput_schemasoutput_schema defined in the function configuration.This schema is used for validating the output of the function, and sent to providers which support structured outputs.parallel_tool_callsnull for elements that should use the configuration value for the function being called.If you don’t provide this field entirely, we default to the configuration value for the function being called.Most model providers do not support parallel tool calls. In those cases, the gateway ignores this field.At the moment, only Fireworks AI and OpenAI support parallel tool calls.params{}){ variant_type: { param: [value1, ...], ... }, ... }.You should prefer to set these parameters in the configuration file if possible.Only use this field if you need to set these parameters dynamically at runtime.Each parameter if specified should be a list of values that may be null that is the same length as the batch size.Note that the parameters will apply to every variant of the specified type.Currently, we support the following:chat_completionfrequency_penaltyjson_modemax_tokenspresence_penaltyreasoning_effortseedservice_tierstop_sequencestemperaturethinking_budget_tokenstop_pverbosityExample
temperature parameter for achat_completion variant for the first inference in a batch of 3, you’d include the following in the request body:{ // ... "params": { "chat_completion": { "temperature": [0.7,null,null] } } // ...}tags[{"user_id": "123"}, null] or[{"author": "Alice"}, {"author": "Bob"}].tool_choicenone: The function should not use any tools.auto: The model decides whether or not to use a tool. If it decides to use a tool, it also decides which tools to use.required: The model should use a tool. If multiple tools are available, the model decides which tool to use.{ specific = "tool_name" }: The model should use a specific tool. The tool must be defined in thetools section of the configuration file or provided inadditional_tools.variant_name/batch_inference, the response is a JSON object containing metadata that allows you to refer to the batch and poll it later on.The response is an object with the following fields:batch_idinference_idsepisode_ids[functions.generate_haiku]type ="chat"[functions.generate_haiku.variants.gpt_4o_mini]type ="chat_completion"model ="openai::gpt-4o-mini-2024-07-18"inputs is equal to theinput field in a regular inference request.curl -X POST http://localhost:3000/batch_inference \ -H "Content-Type: application/json" \ -d '{ "function_name": "generate_haiku", "variant_name": "gpt_4o_mini", "inputs": [ { "messages": [ { "role": "user", "content": "Write a haiku about artificial intelligence." } ] }, { "messages": [ { "role": "user", "content": "Write a haiku about general aviation." } ] }, { "messages": [ { "role": "user", "content": "Write a haiku about anime." } ] } ] }'batch_id as well asinference_ids andepisode_ids for each inference in the batch.{ "batch_id":"019470f0-db4c-7811-9e14-6fe6593a2652", "inference_ids": [ "019470f0-d34a-77a3-9e59-bcc66db2b82f", "019470f0-d34a-77a3-9e59-bcdd2f8e06aa", "019470f0-d34a-77a3-9e59-bcecfb7172a0" ], "episode_ids": [ "019470f0-d34a-77a3-9e59-bc933973d087", "019470f0-d34a-77a3-9e59-bca6e9b748b2", "019470f0-d34a-77a3-9e59-bcb20177bf3a" ]}GET /batch_inference/:batch_id{"status": "pending"}{"status": "failed"}status"completed"batch_idinferencesbatch_id to poll the status of this job:curl -X GET http://localhost:3000/batch_inference/019470f0-db4c-7811-9e14-6fe6593a2652status field.{ "status":"pending"}status field and theinferences field.Each inference object is the same as the response from a regular inference request.{ "status":"completed", "batch_id":"019470f0-db4c-7811-9e14-6fe6593a2652", "inferences": [ { "inference_id":"019470f0-d34a-77a3-9e59-bcc66db2b82f", "episode_id":"019470f0-d34a-77a3-9e59-bc933973d087", "variant_name":"gpt_4o_mini", "content": [ { "type":"text", "text":"Whispers of circuits,\nLearning paths through endless code,\nDreams in binary." } ], "usage": { "input_tokens":15, "output_tokens":19 } }, { "inference_id":"019470f0-d34a-77a3-9e59-bcdd2f8e06aa", "episode_id":"019470f0-d34a-77a3-9e59-bca6e9b748b2", "variant_name":"gpt_4o_mini", "content": [ { "type":"text", "text":"Wings of freedom soar,\nClouds embrace the lonely flight,\nSky whispers adventure." } ], "usage": { "input_tokens":15, "output_tokens":20 } }, { "inference_id":"019470f0-d34a-77a3-9e59-bcecfb7172a0", "episode_id":"019470f0-d34a-77a3-9e59-bcb20177bf3a", "variant_name":"gpt_4o_mini", "content": [ { "type":"text", "text":"Vivid worlds unfold,\nHeroes rise with dreams in hand,\nInk and dreams collide." } ], "usage": { "input_tokens":14, "output_tokens":20 } } ]}GET /batch_inference/:batch_id/inference/:inference_id{"status": "pending"}{"status": "failed"}status"completed"batch_idinferencescurl -X GET http://localhost:3000/batch_inference/019470f0-db4c-7811-9e14-6fe6593a2652/inference/019470f0-d34a-77a3-9e59-bcc66db2b82fstatus field.{ "status":"pending"}status field and theinferences field.Unlike above, this request will return a list containing only the requested inference.{ "status":"completed", "batch_id":"019470f0-db4c-7811-9e14-6fe6593a2652", "inferences": [ { "inference_id":"019470f0-d34a-77a3-9e59-bcc66db2b82f", "episode_id":"019470f0-d34a-77a3-9e59-bc933973d087", "variant_name":"gpt_4o_mini", "content": [ { "type":"text", "text":"Whispers of circuits,\nLearning paths through endless code,\nDreams in binary." } ], "usage": { "input_tokens":15, "output_tokens":19 } } ]}