Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

APIM ❤️ AI - This repo contains experiments on Azure API Management's AI capabilities, integrating with Azure OpenAI, AI Foundry, and much more 🚀

License

NotificationsYou must be signed in to change notification settings

Azure-Samples/AI-Gateway

Repository files navigation

Open Source Love

What's new ✨

Realtime API (Audio and Text) with Azure OpenAI 🔥 experiments with theAOAI Realtime
Realtime API (Audio and Text) with Azure OpenAI + MCP tools 🔥 experiments with theAOAI Realtime + MCP
Model Context Protocol (MCP) ⚙️ experiments with theclient authorization flow
➕ theFinOps Framework lab to manage AI budgets effectively 💰
Agentic ✨ experiments withModel Context Protocol (MCP).
Agentic ✨ experiments withOpenAI Agents SDK.
Agentic ✨ experiments withAI Agent Service fromAzure AI Foundry.
➕ theAI Foundry Deepseek lab with Deepseek R1 model fromAzure AI Foundry.
➕ theZero-to-Production lab with an iterative policy exploration to fine-tune the optimal production configuration.
➕ theTerraform flavor of backend pool load balancing lab.
➕ theAI Foundry SDK lab.
➕ theContent filtering andPrompt shielding labs.
➕ theModel routing lab with OpenAI model based routing.
➕ thePrompt flow lab to try theAzure AI Studio Prompt Flow with Azure API Management.
priority andweight parameters to theBackend pool load balancing lab.
➕ theStreaming tool to test OpenAI streaming with Azure API Management.
➕ theTracing tool to debug and troubleshoot OpenAI APIs usingAzure API Management tracing capability.
➕ image processing to theGPT-4o inferencing lab.
➕ theFunction calling lab with a sample API on Azure Functions.

Contents

  1. 🧠 GenAI Gateway
  2. 🧪 Labs with AI Agents
  3. 🧪 Labs with the Inference API
  4. 🧪 Labs based on Azure OpenAI
  5. 🚀 Getting started
  6. ⛵ Roll-out to production
  7. 🔨 Supporting tools
  8. 🏛️ Well-Architected Framework
  9. 🎒 Show and tell
  10. 🥇 Other Resources

The rapid pace of AI advances demands experimentation-driven approaches for organizations to remain at the forefront of the industry. With AI steadily becoming a game-changer for an array of sectors, maintaining a fast-paced innovation trajectory is crucial for businesses aiming to leverage its full potential.

AI services are predominantly accessed viaAPIs, underscoring the essential need for a robust and efficient API management strategy. This strategy is instrumental for maintaining control and governance over the consumption ofAI services.

With the expanding horizons ofAI services and their seamless integration withAPIs, there is a considerable demand for a comprehensiveAI Gateway pattern, which broadens the core principles of API management. Aiming to accelerate the experimentation of advanced use cases and pave the road for further innovation in this rapidly evolving field. The well-architected principles of theAI Gateway provides a framework for the confident deployment ofIntelligent Apps into production.

🧠 GenAI Gateway

AI-Gateway flow

This repo explores theAI Gateway pattern through a series of experimental labs. TheGenAI Gateway capabilities ofAzure API Management plays a crucial role within these labs, handling AI services APIs, with security, reliability, performance, overall operational efficiency and cost controls. The primary focus is onAzure OpenAI, which sets the standard reference for Large Language Models (LLM). However, the same principles and design patterns could potentially be applied to any LLM.

Acknowledging the rising dominance of Python, particularly in the realm of AI, along with the powerful experimental capabilities of Jupyter notebooks, the following labs are structured around Jupyter notebooks, with step-by-step instructions with Python scripts,Bicep files andAzure API Management policies:

🧪 Labs with AI Agents

Playground to experiment theModel Context Protocol with theclient authorization flow. In this flow, Azure API Management act both as an OAuth client connecting to theMicrosoft Entra ID authorization server and as an OAuth authorization server for the MCP client (MCP inspector in this lab).

flow

🦾 Bicep⚙️ Policy🧾 Notebook

Playground to experiment theModel Context Protocol with Azure API Management to enable plug & play of tools to LLMs. Leverages thecredential manager for managing OAuth 2.0 tokens to backend tools andclient token validation to ensure end-to-end authentication and authorization.

flow

🦾 Bicep⚙️ Policy🧾 Notebook

Playground to try theOpenAI Agents with Azure OpenAI models and API based tools controlled by Azure API Management.

flow

🦾 Bicep⚙️ Policy🧾 Notebook

Use this playground to explore theAzure AI Agent Service, leveraging Azure API Management to control multiple services, including Azure OpenAI models, Logic Apps Workflows, and OpenAPI-based APIs.

flow

🦾 Bicep⚙️ Policy🧾 Notebook

Playground to try the OpenAIfunction calling feature with an Azure Functions API that is also managed by Azure API Management.

flow

🦾 Bicep⚙️ Policy🧾 Notebook

🧪 Labs with the Inference API

Playground to try theDeepseek R1 model via the AI Model Inference fromAzure AI Foundry. This lab uses theAzure AI Model Inference API and two APIM LLM policies:llm-token-limit andllm-emit-token-metric.

flow

🦾 Bicep⚙️ Policy🧾 Notebook

Playground to try the self-hostedPhi-3 Small Language Model (SLM) through theAzure API Management self-hosted gateway with OpenAI API compatibility.

flow

🦾 Bicep⚙️ Policy🧾 Notebook

🧪 Labs based on Azure OpenAI

This playground leverages theFinOps Framework and Azure API Management to control AI costs. It uses thetoken limit policy for eachproduct and integratesAzure Monitor alerts withLogic Apps to automatically disable APIMsubscriptions that exceed cost quotas.

flow

🦾 Bicep⚙️ Policy🧾 Notebook

Playground to try the built-in load balancingbackend pool functionality of Azure API Management to either a list of Azure OpenAI endpoints or mock servers.

flow

🦾 Bicep⚙️ Policy🧾 Notebook

Playground to try thetoken rate limiting policy to one or more Azure OpenAI endpoints. When the token usage is exceeded, the caller receives a 429.

flow

🦾 Bicep⚙️ Policy🧾 Notebook

Playground to try theemit token metric policy. The policy sends metrics to Application Insights about consumption of large language model tokens through Azure OpenAI Service APIs.

flow

🦾 Bicep⚙️ Policy🧾 Notebook

Playground to try thesemantic caching policy. Uses vector proximity of the prompt to previous requests and a specified similarity score threshold.

flow

🦾 Bicep⚙️ Policy🧾 Notebook

Playground to try theOAuth 2.0 authorization feature using identity provider to enable more fine-grained access to OpenAPI APIs by particular users or client.

flow

🦾 Bicep⚙️ Policy🧾 Notebook

Playground to create a combination of several policies in an iterative approach. We start with load balancing, then progressively add token emitting, rate limiting, and, eventually, semantic caching. Each of these sets of policies is derived from other labs in this repo.

flow

🦾 Bicep⚙️ Policy🧾 Notebook

Playground to try the new GPT-4o model. GPT-4o ("o" for "omni") is designed to handle a combination of text, audio, and video inputs, and can generate outputs in text, audio, and image formats.

flow

🦾 Bicep⚙️ Policy🧾 Notebook

Playground to try routing to a backend based on Azure OpenAI model and version.

flow

🦾 Bicep⚙️ Policy🧾 Notebook

Playground to try theRetrieval Augmented Generation (RAG) pattern with Azure AI Search, Azure OpenAI embeddings and Azure OpenAI completions.

flow

🦾 Bicep⚙️ Policy🧾 Notebook

Playground to try thebuil-in logging capabilities of Azure API Management. Logs requests into App Insights to track details and token usage.

flow

🦾 Bicep⚙️ Policy🧾 Notebook

Playground to test storing message details into Cosmos DB through theLog to event hub policy. With the policy we can control which data will be stored in the DB (prompt, completion, model, region, tokens etc.).

flow

🦾 Bicep⚙️ Policy🧾 Notebook

Playground to try theAzure AI Studio Prompt Flow with Azure API Management.

flow

🦾 Bicep⚙️ Policy🧾 Notebook

Playground to try integrating Azure API Management withAzure AI Content Safety to filter potentially offensive, risky, or undesirable content.

flow

🦾 Bicep⚙️ Policy🧾 Notebook

Playground to try Prompt Shields from Azure AI Content Safety service that analyzes LLM inputs and detects User Prompt attacks and Document attacks, which are two common types of adversarial inputs.

flow

🦾 Bicep⚙️ Policy🧾 Notebook

Backlog of Labs

This is a list of potential future labs to be developed.

  • Real Time API
  • Semantic Kernel with Agents
  • Logic Apps RAG
  • PII handling
  • Gemini

Tip

Kindly usethe feedback discussion so that we can continuously improve with your experiences, suggestions, ideas or lab requests.

🚀 Getting Started

Prerequisites

Quickstart

  1. Clone this repo and configure your local machine with the prerequisites. Or just create aGitHub Codespace and run it on the browser or in VS Code.
  2. Navigate through the available labs and select one that best suits your needs. For starters we recommend thetoken rate limiting.
  3. Open the notebook and run the provided steps.
  4. Tailor the experiment according to your requirements. If you wish to contribute to our collective work, we would appreciate yoursubmission of a pull request.

Note

🪲 Please feel free to open a newissue if you find something that should be fixed or enhanced.

⛵ Roll-out to production

We recommend the guidelines and best practices from theAI Hub Gateway Landing Zone to implement a central AI API gateway to empower various line-of-business units in an organization to leverage Azure AI services.

🔨 Supporting Tools

  • AI-Gateway Mock server is designed to mimic the behavior and responses of the OpenAI API, thereby creating an efficient simulation environment suitable for testing and development purposes on the integration with Azure API Management and other use cases. Theapp.py can be customized to tailor the Mock server to specific use cases.
  • Tracing - Invoke OpenAI API with trace enabled and returns the tracing information.
  • Streaming - Invoke OpenAI API with stream enabled and returns response in chunks.

🏛️ Well-Architected Framework

TheAzure Well-Architected Framework is a design framework that can improve the quality of a workload. The following table maps labs with the Well-Architected Framework pillars to set you up for success through architectural experimentation.

LabSecurityReliabilityPerformanceOperationsCosts
Request forwarding
Backend circuit breaking
Backend pool load balancing
Advanced load balancing
Response streaming
Vector searching
Built-in logging
SLM self-hosting

🎒 Show and tell

Tip

Install theVS Code Reveal extension, open AI-GATEWAY.md and click on 'slides' at the botton to present the AI Gateway without leaving VS Code.Or just open theAI-GATEWAY.pptx for a plain old PowerPoint experience.

🥇 Other resources

Numerous reference architectures, best practices and starter kits are available on this topic. Please refer to the resources provided if you need comprehensive solutions or a landing zone to initiate your project. We suggest leveraging the AI-Gateway labs to discover additional capabilities that can be integrated into the reference architectures.

We believe that there may be valuable content that we are currently unaware of. We would greatly appreciate any suggestions or recommendations to enhance this list.

🌐 WW GBB initiative

GBB

Disclaimer

Important

This software is provided for demonstration purposes only. It is not intended to be relied upon for any purpose. The creators of this software make no representations or warranties of any kind, express or implied, about the completeness, accuracy, reliability, suitability or availability with respect to the software or the information, products, services, or related graphics contained in the software for any purpose. Any reliance you place on such information is therefore strictly at your own risk.

About

APIM ❤️ AI - This repo contains experiments on Azure API Management's AI capabilities, integrating with Azure OpenAI, AI Foundry, and much more 🚀

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks


[8]ページ先頭

©2009-2025 Movatter.jp