- Notifications
You must be signed in to change notification settings - Fork184
APIM ❤️ AI - This repo contains experiments on Azure API Management's AI capabilities, integrating with Azure OpenAI, AI Foundry, and much more 🚀
License
Azure-Samples/AI-Gateway
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
🧪AI Gateway Labs withAzure API Management
➕Realtime API (Audio and Text) with Azure OpenAI 🔥 experiments with theAOAI Realtime
➕Realtime API (Audio and Text) with Azure OpenAI + MCP tools 🔥 experiments with theAOAI Realtime + MCP
➕Model Context Protocol (MCP) ⚙️ experiments with theclient authorization flow
➕ theFinOps Framework lab to manage AI budgets effectively 💰
➕Agentic ✨ experiments withModel Context Protocol (MCP).
➕Agentic ✨ experiments withOpenAI Agents SDK.
➕Agentic ✨ experiments withAI Agent Service fromAzure AI Foundry.
➕ theAI Foundry Deepseek lab with Deepseek R1 model fromAzure AI Foundry.
➕ theZero-to-Production lab with an iterative policy exploration to fine-tune the optimal production configuration.
➕ theTerraform flavor of backend pool load balancing lab.
➕ theAI Foundry SDK lab.
➕ theContent filtering andPrompt shielding labs.
➕ theModel routing lab with OpenAI model based routing.
➕ thePrompt flow lab to try theAzure AI Studio Prompt Flow with Azure API Management.
➕priority
andweight
parameters to theBackend pool load balancing lab.
➕ theStreaming tool to test OpenAI streaming with Azure API Management.
➕ theTracing tool to debug and troubleshoot OpenAI APIs usingAzure API Management tracing capability.
➕ image processing to theGPT-4o inferencing lab.
➕ theFunction calling lab with a sample API on Azure Functions.
- 🧠 GenAI Gateway
- 🧪 Labs with AI Agents
- 🧪 Labs with the Inference API
- 🧪 Labs based on Azure OpenAI
- 🚀 Getting started
- ⛵ Roll-out to production
- 🔨 Supporting tools
- 🏛️ Well-Architected Framework
- 🎒 Show and tell
- 🥇 Other Resources
The rapid pace of AI advances demands experimentation-driven approaches for organizations to remain at the forefront of the industry. With AI steadily becoming a game-changer for an array of sectors, maintaining a fast-paced innovation trajectory is crucial for businesses aiming to leverage its full potential.
AI services are predominantly accessed viaAPIs, underscoring the essential need for a robust and efficient API management strategy. This strategy is instrumental for maintaining control and governance over the consumption ofAI services.
With the expanding horizons ofAI services and their seamless integration withAPIs, there is a considerable demand for a comprehensiveAI Gateway pattern, which broadens the core principles of API management. Aiming to accelerate the experimentation of advanced use cases and pave the road for further innovation in this rapidly evolving field. The well-architected principles of theAI Gateway provides a framework for the confident deployment ofIntelligent Apps into production.
This repo explores theAI Gateway pattern through a series of experimental labs. TheGenAI Gateway capabilities ofAzure API Management plays a crucial role within these labs, handling AI services APIs, with security, reliability, performance, overall operational efficiency and cost controls. The primary focus is onAzure OpenAI, which sets the standard reference for Large Language Models (LLM). However, the same principles and design patterns could potentially be applied to any LLM.
Acknowledging the rising dominance of Python, particularly in the realm of AI, along with the powerful experimental capabilities of Jupyter notebooks, the following labs are structured around Jupyter notebooks, with step-by-step instructions with Python scripts,Bicep files andAzure API Management policies:
Playground to experiment theModel Context Protocol with theclient authorization flow. In this flow, Azure API Management act both as an OAuth client connecting to theMicrosoft Entra ID authorization server and as an OAuth authorization server for the MCP client (MCP inspector in this lab).
🦾 Bicep ➕⚙️ Policy ➕🧾 Notebook
Playground to experiment theModel Context Protocol with Azure API Management to enable plug & play of tools to LLMs. Leverages thecredential manager for managing OAuth 2.0 tokens to backend tools andclient token validation to ensure end-to-end authentication and authorization.
🦾 Bicep ➕⚙️ Policy ➕🧾 Notebook
Playground to try theOpenAI Agents with Azure OpenAI models and API based tools controlled by Azure API Management.
🦾 Bicep ➕⚙️ Policy ➕🧾 Notebook
Use this playground to explore theAzure AI Agent Service, leveraging Azure API Management to control multiple services, including Azure OpenAI models, Logic Apps Workflows, and OpenAPI-based APIs.
🦾 Bicep ➕⚙️ Policy ➕🧾 Notebook
Playground to try the OpenAIfunction calling feature with an Azure Functions API that is also managed by Azure API Management.
🦾 Bicep ➕⚙️ Policy ➕🧾 Notebook
Playground to try theDeepseek R1 model via the AI Model Inference fromAzure AI Foundry. This lab uses theAzure AI Model Inference API and two APIM LLM policies:llm-token-limit andllm-emit-token-metric.
🦾 Bicep ➕⚙️ Policy ➕🧾 Notebook
🧪 SLM self-hosting (Phi-3)
Playground to try the self-hostedPhi-3 Small Language Model (SLM) through theAzure API Management self-hosted gateway with OpenAI API compatibility.
🦾 Bicep ➕⚙️ Policy ➕🧾 Notebook
This playground leverages theFinOps Framework and Azure API Management to control AI costs. It uses thetoken limit policy for eachproduct and integratesAzure Monitor alerts withLogic Apps to automatically disable APIMsubscriptions that exceed cost quotas.
🦾 Bicep ➕⚙️ Policy ➕🧾 Notebook
🧪 Backend pool load balancing - Available withBicep andTerraform
Playground to try the built-in load balancingbackend pool functionality of Azure API Management to either a list of Azure OpenAI endpoints or mock servers.
🦾 Bicep ➕⚙️ Policy ➕🧾 Notebook
Playground to try thetoken rate limiting policy to one or more Azure OpenAI endpoints. When the token usage is exceeded, the caller receives a 429.
🦾 Bicep ➕⚙️ Policy ➕🧾 Notebook
Playground to try theemit token metric policy. The policy sends metrics to Application Insights about consumption of large language model tokens through Azure OpenAI Service APIs.
🦾 Bicep ➕⚙️ Policy ➕🧾 Notebook
Playground to try thesemantic caching policy. Uses vector proximity of the prompt to previous requests and a specified similarity score threshold.
🦾 Bicep ➕⚙️ Policy ➕🧾 Notebook
Playground to try theOAuth 2.0 authorization feature using identity provider to enable more fine-grained access to OpenAPI APIs by particular users or client.
🦾 Bicep ➕⚙️ Policy ➕🧾 Notebook
Playground to create a combination of several policies in an iterative approach. We start with load balancing, then progressively add token emitting, rate limiting, and, eventually, semantic caching. Each of these sets of policies is derived from other labs in this repo.
🦾 Bicep ➕⚙️ Policy ➕🧾 Notebook
Playground to try the new GPT-4o model. GPT-4o ("o" for "omni") is designed to handle a combination of text, audio, and video inputs, and can generate outputs in text, audio, and image formats.
🦾 Bicep ➕⚙️ Policy ➕🧾 Notebook
Playground to try routing to a backend based on Azure OpenAI model and version.
🦾 Bicep ➕⚙️ Policy ➕🧾 Notebook
Playground to try theRetrieval Augmented Generation (RAG) pattern with Azure AI Search, Azure OpenAI embeddings and Azure OpenAI completions.
🦾 Bicep ➕⚙️ Policy ➕🧾 Notebook
Playground to try thebuil-in logging capabilities of Azure API Management. Logs requests into App Insights to track details and token usage.
🦾 Bicep ➕⚙️ Policy ➕🧾 Notebook
Playground to test storing message details into Cosmos DB through theLog to event hub policy. With the policy we can control which data will be stored in the DB (prompt, completion, model, region, tokens etc.).
🦾 Bicep ➕⚙️ Policy ➕🧾 Notebook
Playground to try theAzure AI Studio Prompt Flow with Azure API Management.
🦾 Bicep ➕⚙️ Policy ➕🧾 Notebook
Playground to try integrating Azure API Management withAzure AI Content Safety to filter potentially offensive, risky, or undesirable content.
🦾 Bicep ➕⚙️ Policy ➕🧾 Notebook
Playground to try Prompt Shields from Azure AI Content Safety service that analyzes LLM inputs and detects User Prompt attacks and Document attacks, which are two common types of adversarial inputs.
🦾 Bicep ➕⚙️ Policy ➕🧾 Notebook
This is a list of potential future labs to be developed.
- Real Time API
- Semantic Kernel with Agents
- Logic Apps RAG
- PII handling
- Gemini
Tip
Kindly usethe feedback discussion so that we can continuously improve with your experiences, suggestions, ideas or lab requests.
- Python 3.12 or later version installed
- VS Code installed with theJupyter notebook extension enabled
- Python environment with therequirements.txt or run
pip install -r requirements.txt
in your terminal - An Azure Subscription withContributor +RBAC Administrator orOwner roles
- Azure CLI installed andSigned into your Azure subscription
- Clone this repo and configure your local machine with the prerequisites. Or just create aGitHub Codespace and run it on the browser or in VS Code.
- Navigate through the available labs and select one that best suits your needs. For starters we recommend thetoken rate limiting.
- Open the notebook and run the provided steps.
- Tailor the experiment according to your requirements. If you wish to contribute to our collective work, we would appreciate yoursubmission of a pull request.
Note
🪲 Please feel free to open a newissue if you find something that should be fixed or enhanced.
We recommend the guidelines and best practices from theAI Hub Gateway Landing Zone to implement a central AI API gateway to empower various line-of-business units in an organization to leverage Azure AI services.
- AI-Gateway Mock server is designed to mimic the behavior and responses of the OpenAI API, thereby creating an efficient simulation environment suitable for testing and development purposes on the integration with Azure API Management and other use cases. Theapp.py can be customized to tailor the Mock server to specific use cases.
- Tracing - Invoke OpenAI API with trace enabled and returns the tracing information.
- Streaming - Invoke OpenAI API with stream enabled and returns response in chunks.
TheAzure Well-Architected Framework is a design framework that can improve the quality of a workload. The following table maps labs with the Well-Architected Framework pillars to set you up for success through architectural experimentation.
Lab | Security | Reliability | Performance | Operations | Costs |
---|---|---|---|---|---|
Request forwarding | ⭐ | ||||
Backend circuit breaking | ⭐ | ⭐ | |||
Backend pool load balancing | ⭐ | ⭐ | ⭐ | ||
Advanced load balancing | ⭐ | ⭐ | ⭐ | ||
Response streaming | ⭐ | ⭐ | |||
Vector searching | ⭐ | ⭐ | ⭐ | ||
Built-in logging | ⭐ | ⭐ | ⭐ | ⭐ | ⭐ |
SLM self-hosting | ⭐ | ⭐ |
Tip
Check theAzure Well-Architected Framework perspective on Azure OpenAI Service for aditional guidance.
Tip
Install theVS Code Reveal extension, open AI-GATEWAY.md and click on 'slides' at the botton to present the AI Gateway without leaving VS Code.Or just open theAI-GATEWAY.pptx for a plain old PowerPoint experience.
Numerous reference architectures, best practices and starter kits are available on this topic. Please refer to the resources provided if you need comprehensive solutions or a landing zone to initiate your project. We suggest leveraging the AI-Gateway labs to discover additional capabilities that can be integrated into the reference architectures.
- GenAI Gateway Guide
- Azure OpenAI + APIM Sample
- AI+API better together: Benefits & Best Practices using APIs for AI workloads
- Designing and implementing a gateway solution with Azure OpenAI resources
- Azure OpenAI Using PTUs/TPMs With API Management - Using the Scaling Special Sauce
- Manage Azure OpenAI using APIM
- Setting up Azure OpenAI as a central capability with Azure API Management
- Introduction to Building AI Apps
We believe that there may be valuable content that we are currently unaware of. We would greatly appreciate any suggestions or recommendations to enhance this list.
Important
This software is provided for demonstration purposes only. It is not intended to be relied upon for any purpose. The creators of this software make no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability or availability with respect to the software or the information, products, services, or related graphics contained in the software for any purpose. Any reliance you place on such information is therefore strictly at your own risk.
About
APIM ❤️ AI - This repo contains experiments on Azure API Management's AI capabilities, integrating with Azure OpenAI, AI Foundry, and much more 🚀