Quickstart

Realtime agents enable voice conversations with your AI agents using OpenAI's Realtime API. This guide walks you through creating your first realtime voice agent.

Beta feature

Realtime agents are in beta. Expect some breaking changes as we improve the implementation.

Prerequisites

Python 3.9 or higher
OpenAI API key
Basic familiarity with the OpenAI Agents SDK

Installation

If you haven't already, install the OpenAI Agents SDK:

pipinstallopenai-agents

Creating your first realtime agent

1. Import required components

importasynciofromagents.realtimeimportRealtimeAgent,RealtimeRunner

2. Create a realtime agent

agent=RealtimeAgent(name="Assistant",instructions="You are a helpful voice assistant. Keep your responses conversational and friendly.",)

3. Set up the runner

runner=RealtimeRunner(starting_agent=agent,config={"model_settings":{"model_name":"gpt-realtime","voice":"ash","modalities":["audio"],"input_audio_format":"pcm16","output_audio_format":"pcm16","input_audio_transcription":{"model":"gpt-4o-mini-transcribe"},"turn_detection":{"type":"semantic_vad","interrupt_response":True},}})

4. Start a session

# Start the sessionsession=awaitrunner.run()asyncwithsession:print("Session started! The agent will stream audio responses in real-time.")# Process eventsasyncforeventinsession:try:ifevent.type=="agent_start":print(f"Agent started:{event.agent.name}")elifevent.type=="agent_end":print(f"Agent ended:{event.agent.name}")elifevent.type=="handoff":print(f"Handoff from{event.from_agent.name} to{event.to_agent.name}")elifevent.type=="tool_start":print(f"Tool started:{event.tool.name}")elifevent.type=="tool_end":print(f"Tool ended:{event.tool.name}; output:{event.output}")elifevent.type=="audio_end":print("Audio ended")elifevent.type=="audio":# Enqueue audio for callback-based playback with metadata# Non-blocking put; queue is unbounded, so drops won’t occur.passelifevent.type=="audio_interrupted":print("Audio interrupted")# Begin graceful fade + flush in the audio callback and rebuild jitter buffer.elifevent.type=="error":print(f"Error:{event.error}")elifevent.type=="history_updated":pass# Skip these frequent eventselifevent.type=="history_added":pass# Skip these frequent eventselifevent.type=="raw_model_event":print(f"Raw model event:{_truncate_str(str(event.data),200)}")else:print(f"Unknown event type:{event.type}")exceptExceptionase:print(f"Error processing event:{_truncate_str(str(e),200)}")def_truncate_str(s:str,max_length:int)->str:iflen(s)>max_length:returns[:max_length]+"..."returns

Complete example

Here's a complete working example:

importasynciofromagents.realtimeimportRealtimeAgent,RealtimeRunnerasyncdefmain():# Create the agentagent=RealtimeAgent(name="Assistant",instructions="You are a helpful voice assistant. Keep responses brief and conversational.",)# Set up the runner with configurationrunner=RealtimeRunner(starting_agent=agent,config={"model_settings":{"model_name":"gpt-realtime","voice":"ash","modalities":["audio"],"input_audio_format":"pcm16","output_audio_format":"pcm16","input_audio_transcription":{"model":"gpt-4o-mini-transcribe"},"turn_detection":{"type":"semantic_vad","interrupt_response":True},}},)# Start the sessionsession=awaitrunner.run()asyncwithsession:print("Session started! The agent will stream audio responses in real-time.")# Process eventsasyncforeventinsession:try:ifevent.type=="agent_start":print(f"Agent started:{event.agent.name}")elifevent.type=="agent_end":print(f"Agent ended:{event.agent.name}")elifevent.type=="handoff":print(f"Handoff from{event.from_agent.name} to{event.to_agent.name}")elifevent.type=="tool_start":print(f"Tool started:{event.tool.name}")elifevent.type=="tool_end":print(f"Tool ended:{event.tool.name}; output:{event.output}")elifevent.type=="audio_end":print("Audio ended")elifevent.type=="audio":# Enqueue audio for callback-based playback with metadata# Non-blocking put; queue is unbounded, so drops won’t occur.passelifevent.type=="audio_interrupted":print("Audio interrupted")# Begin graceful fade + flush in the audio callback and rebuild jitter buffer.elifevent.type=="error":print(f"Error:{event.error}")elifevent.type=="history_updated":pass# Skip these frequent eventselifevent.type=="history_added":pass# Skip these frequent eventselifevent.type=="raw_model_event":print(f"Raw model event:{_truncate_str(str(event.data),200)}")else:print(f"Unknown event type:{event.type}")exceptExceptionase:print(f"Error processing event:{_truncate_str(str(e),200)}")def_truncate_str(s:str,max_length:int)->str:iflen(s)>max_length:returns[:max_length]+"..."returnsif__name__=="__main__":# Run the sessionasyncio.run(main())

Configuration options

Model settings

model_name: Choose from available realtime models (e.g.,gpt-realtime)
voice: Select voice (alloy,echo,fable,onyx,nova,shimmer)
modalities: Enable text or audio (["text"] or["audio"])

Audio settings

input_audio_format: Format for input audio (pcm16,g711_ulaw,g711_alaw)
output_audio_format: Format for output audio
input_audio_transcription: Transcription configuration

Turn detection

type: Detection method (server_vad,semantic_vad)
threshold: Voice activity threshold (0.0-1.0)
silence_duration_ms: Silence duration to detect turn end
prefix_padding_ms: Audio padding before speech

Next steps

Learn more about realtime agents
Check out working examples in theexamples/realtime folder
Add tools to your agent
Implement handoffs between agents
Set up guardrails for safety

Authentication

Make sure your OpenAI API key is set in your environment:

exportOPENAI_API_KEY="your-api-key-here"

Or pass it directly when creating the session:

session=awaitrunner.run(model_config={"api_key":"your-api-key"})

Movatterモバイル変換