Overview

The TensorZero Gateway is a high-performance model gateway that provides a unified interface for all your LLM applications.

One API for All LLMs.The gateway provides a unified interface for all major LLM providers, allowing for seamless cross-platform integration and fallbacks.TensorZero natively supportsAnthropic,AWS Bedrock,AWS SageMaker,Azure OpenAI Service,Fireworks,GCP Vertex AI Anthropic,GCP Vertex AI Gemini,Google AI Studio (Gemini API),Groq,Hyperbolic,Mistral,OpenAI,OpenRouter,Together,vLLM, andxAI.Need something else?Your provider is most likely supported because TensorZero integrates withany OpenAI-compatible API (e.g. Ollama).Still not supported?Open an issue onGitHub and we’ll integrate it!
Learn more in ourHow to call any LLM guide.
Blazing Fast.The gateway (written in Rust 🦀) achieves <1ms P99 latency overhead under extreme load.Inbenchmarks, LiteLLM @ 100 QPS adds 25-100x+ more latency than our gateway @ 10,000 QPS.
Structured Inferences.The gateway enforces schemas for inputs and outputs, ensuring robustness for your application.Structured inference data is later used for powerful optimization recipes (e.g. swapping historical prompts before fine-tuning).Learn more aboutcreating prompt templates.
Multi-Step LLM Workflows.The gateway provides first-class support for complex multi-step LLM workflows by associating multiple inferences with an episode.Feedback can be assigned at the inference or episode level, allowing for end-to-end optimization of compound LLM systems.Learn more aboutepisodes.
Built-in Observability.The gateway collects structured inference traces along with associated downstream metrics and natural-language feedback.Everything is stored in a ClickHouse database for real-time, scalable, and developer-friendly analytics.TensorZero Recipes leverage this dataset to optimize your LLMs.
Built-in Experimentation.The gateway automatically routes traffic between variants to enable A/B tests.It ensures consistent variants within an episode in multi-step workflows.Learn more aboutadaptive A/B tests.
Built-in Fallbacks.The gateway automatically fallbacks failed inferences to different inference providers, or even completely different variants.Ensure misconfiguration, provider downtime, and other edge cases don’t affect your availability.
Access Controls.The gateway supports TensorZero API key authentication, allowing you to control access to your TensorZero deployment.Create and manage custom API keys for different clients or services.Learn more aboutsetting up auth for TensorZero.
GitOps Orchestration.Orchestrate prompts, models, parameters, tools, experiments, and more with GitOps-friendly configuration.Manage a few LLMs manually with human-friendly readable configuration files, or thousands of prompts and LLMs entirely programmatically.