This browser is no longer supported.
Upgrade to Microsoft Edge to take advantage of the latest features, security updates, and technical support.
Note
Access to this page requires authorization. You can trysigning in orchanging directories.
Access to this page requires authorization. You can trychanging directories.
Note
This document refers to theMicrosoft Foundry (classic) portal.
🔍View the Microsoft Foundry (new) documentation to learn about the new portal.
This article provides a summary of the latest releases and major documentation updates for Azure OpenAI.
The Realtime API now supports SIP, enabling telephony connections to realtimeapi. For more information, see theRealtime SIP documentation.
gpt-4o-transcribe-diarize speech to text model is released. This is an Automatic Speech Recognition (ASR) model that converts spoken language into text in real time. It enables organizations to unlock insights from conversations instantly with ultra-low latency and high accuracy across 100+ languages. This capability is essential for workflows where voice data drives decisions—such as customer support, virtual meetings, and live events.Diarization is the process of identifying who spoke when in an audio stream. It transforms conversations into speaker-attributed transcripts, enabling businesses to extract actionable insights from meetings, customer calls, and live events. With advanced models likegpt-4o-transcribe-diarize, organizations gain real-time clarity and context—turning voice into structured data that drives smarter decisions and improves productivity, supporting automatic speech recognition.
Use this model via the/audio and/realtime APIs.
Thegpt-image-1-mini model is now available for global deployments. It is a smaller version of thegpt-image-1 model that offers a good balance between performance and cost. All use cases are currently supported, except for image edits and input fidelity.
Request access:Limited access model application
Follow theimage generation how-to guide to get started with this model.
Personally identifiable information (PII) detection is now available as a built-in content filter. This feature allows you to identify and block sensitive information in LLM outputs, enhancing data privacy. For more information, see thePII detection documentation.
To learn more aboutgpt-5-codex, see thegetting started with reasoning models page.
gpt-5-codex is designed to be used with theCodex CLI and the Visual Studio Code Codex extension.
Registration is required for access to the gpt-5-codex model. If you have previously registered and obtained access to other limited access models likegpt-5, you do not need to reapply and will automatically be granted access.
The Sora model from OpenAI now supports video-to-video generation. You can provide a short video as input to generate a new, longer video that incorporates the input video. See thequickstart to get started.
The Sora model from OpenAI now supports image-to-video generation. You can provide an image as input to the model to generate a video that incorporates the content of the image. You can also specify the frame of the video in which the image should appear: it doesn't need to be the beginning. See thequickstart to get started.
This Sora model is now available in the Sweden Central region and East US 2.
OpenAI's GPT RealTime and Audio models are now generally available on Microsoft Foundry Direct Models.
Model improvements:
Realtime API service improvements:
We highly recommend that all customers transition to the newly launched GA models to take full advantage of the latest features. Visit theAzure OpenAI documentation andFoundry Playground to explore capabilities and integrate into your applications.
Spillover is now Generally Available. Spillover manages traffic fluctuations on provisioned deployments by routing overages to a designated standard deployment. To learn more about how to maximize utilization for your provisioned deployments with spillover, seeManage traffic with spillover for provisioned deployments.
gpt-5,gpt-5-mini,gpt-5-nano To learn more, see thegetting started with reasoning models page.
gpt-5-chat is now available. To learn more, see themodels page
gpt-5 is now available forProvisioned Throughput Units (PTU).
gpt-5-mini,gpt-5-nano, andgpt-5-chat don't require registration.
Input fidelity parameter: Theinput_fidelity parameter in the image edits API lets you control how closely the model conveys the style and features of the subjects in the original (input) image. This is useful for:
Partial image streaming: The image generation and image edits APIs support partial image streaming, where they return images with partially rendered content throughout the image generation process. Display these images to the user to provide earlier visual feedback and show the progress of the image generation operation.
codex-mini ando3-pro are now available. To learn more, see thegetting started with reasoning models pageThe Sora (2025-05-02) model is a video generation model from OpenAI that can create realistic and imaginative video scenes from text instructions.
Follow theVideo generation quickstart to get started. For more information, see theVideo generation concepts guide.
Spotlighting is a sub-feature of prompt shields that enhances protection against indirect (embedded document) attacks by tagging input documents with special formatting to indicate lower trust to the model. For more information, see thePrompt shields filter documentation.
Model router for Foundry is a deployable AI chat model that automatically selects the best underlying chat model to respond to a given prompt. For more information on how model router works and its advantages and limitations, see theModel router concepts guide. To use model router with the Completions API, follow theHow-to guide.
The Realtime API (preview) now supports WebRTC, enabling real-time audio streaming and low-latency interactions. This feature is ideal for applications requiring immediate feedback, such as live customer support or interactive voice assistants. For more information, see theRealtime API (preview) documentation.
GPT-image-1 (2025-04-15) is the latest image generation model from Azure OpenAI. It features major improvements over DALL-E, including:
Request access:Limited access model application
Follow theimage generation how-to guide to get started with the new model.
o4-mini ando3 models are now available. These are the latest reasoning models from Azure OpenAI offering enhanced reasoning, quality, and performance. For more information, see thegetting started with reasoning models page.
GPT 4.1 and GPT 4.1-nano are now available. These are the latest models from Azure OpenAI. GPT 4.1 has a 1 million token context limit. For more information, see themodels page.
New audio models powered by GPT-4o are now available.
Thegpt-4o-transcribe andgpt-4o-mini-transcribe speech to text models are released. Use these models via the/audio and/realtime APIs.
Thegpt-4o-mini-tts text to speech model is released. Use thegpt-4o-mini-tts model for text to speech generation via the/audio API.
For more information about available models, see themodels and versions documentation.
TheResponses API is a new stateful API from Azure OpenAI. It brings together the best capabilities from the chat completions and assistants API in one unified experience. The Responses API also adds support for the newcomputer-use-preview model, which powers theComputer use capability.
For access tocomputer-use-preview registration is required, and access will be granted based on Microsoft's eligibility criteria. Customers who have access to other limited access models still need to request access for this model.
Request access:computer-use-preview limited access model application
For more information on model capabilities, and region availability see themodels documentation.
Playwright integration demo code.
Spillover manages traffic fluctuations on provisioned deployments by routing overages to a designated standard deployment. To learn more about how to maximize utilization for your provisioned deployments with spillover, seeManage traffic with spillover for provisioned deployments (preview).
In addition to the deployment-level content filtering configuration, we now also provide a request header that allows you specify your custom configuration at request time for every API call. For more information, seeUse content filters (preview).
The latest GPT model that excels at diverse text and image tasks is now available on Azure OpenAI.
For more information on model capabilities, and region availability see themodels documentation.
Stored completions allow you to capture the conversation history from chat completions sessions to use as datasets for evaluations and fine-tuning.
o3-mini is now available for global standard, and data zone standard deployments for registered limited access customers.
For more information, see ourreasoning model guide.
Thegpt-4o-mini-audio-preview (2024-12-17) model is the latest audio completions model. For more information, see theaudio generation quickstart.
Thegpt-4o-mini-realtime-preview (2024-12-17) model is the latest real-time audio model. The real-time models use the same underlying GPT-4o audio model as the completions API, but is optimized for low-latency, real-time audio interactions. For more information, see thereal-time audio quickstart.
For more information about available models, see themodels and versions documentation.
o3-mini (2025-01-31) is the latest reasoning model, offering enhanced reasoning abilities. For more information, see ourreasoning model guide.
Thegpt-4o-audio-preview model is now available for global deployments inEast US 2 and Sweden Central regions. Use thegpt-4o-audio-preview model for audio generation.
Thegpt-4o-audio-preview model introduces the audio modality into the existing/chat/completions API. The audio model expands the potential for AI applications in text and voice-based interactions and audio analysis. Modalities supported ingpt-4o-audio-preview model include: text, audio, and text + audio. For more information, see theaudio generation quickstart.
Note
TheRealtime API uses the same underlying GPT-4o audio model as the completions API, but is optimized for low-latency, real-time audio interactions.
Thegpt-4o-realtime-preview model version 2024-12-17 is available for global deployments inEast US 2 and Sweden Central regions. Use thegpt-4o-realtime-preview version 2024-12-17 model instead of thegpt-4o-realtime-preview version 2024-10-01-preview model for real-time audio interactions.
gpt-4o-realtime-preview model.gpt-4o-realtime-preview models now support the following voices:alloy,ash,ballad,coral,echo,sage,shimmer,verse.gpt-4o-realtime-preview model. The rate limits for eachgpt-4o-realtime-preview model deployment are 100 K TPM and 1 K RPM. During the preview,Foundry portal and APIs might inaccurately show different rate limits. Even if you try to set a different rate limit, the actual rate limit is 100 K TPM and 1 K RPM.For more information, see theGPT real-time audio quickstart and thehow-to guide.
The latesto1 model is now available for API access and model deployment.Registration is required, and access will be granted based on Microsoft's eligibility criteria. Customers who previously applied and received access too1-preview, don't need to reapply as they're automatically on the wait-list for the latest model.
Request access:limited access model application
To learn more about the advancedo1 series models see,getting started with o1 series reasoning models.
| Model | Region |
|---|---|
o1(Version: 2024-12-17) | East US2 (Global Standard) Sweden Central (Global Standard) |
Direct preference optimization (DPO) is a new alignment technique for large language models, designed to adjust model weights based on human preferences. Unlike reinforcement learning from human feedback (RLHF), DPO doesn't require fitting a reward model and uses simpler data (binary preferences) for training. This method is computationally lighter and faster, making it equally effective at alignment while being more efficient. DPO is especially useful in scenarios where subjective elements like tone, style, or specific content preferences are important. We’re excited to announce the public preview of DPO in Azure OpenAI, starting with thegpt-4o-2024-08-06 model.
For fine-tuning model region availability, see themodels page.
Stored completions allow you to capture the conversation history from chat completions sessions to use as datasets forevaluations andfine-tuning.
gpt-4o-2024-11-20 is now available forglobal standard deployment in:
Data zone provisioned deployments are available in the same Azure OpenAI resource as all other Azure OpenAI deployment types but allow you to use Azure global infrastructure to dynamically route traffic to the data center within the Microsoft defined data zone with the best availability for each request. Data zone provisioned deployments provide reserved model processing capacity for high and predictable throughput using Azure infrastructure within Microsoft specified data zones. Data zone provisioned deployments are supported ongpt-4o-2024-08-06,gpt-4o-2024-05-13, andgpt-4o-mini-2024-07-18 models.
For more information, see thedeployment types guide.
Learn more about theunderlying models that power Azure OpenAI.
Was this page helpful?
Need help with this topic?
Want to try using Ask Learn to clarify or guide you through this topic?
Was this page helpful?
Want to try using Ask Learn to clarify or guide you through this topic?