Quickstart
Realtime agents enable voice conversations with your AI agents using OpenAI's Realtime API. This guide walks you through creating your first realtime voice agent.
Beta feature
Realtime agents are in beta. Expect some breaking changes as we improve the implementation.
Prerequisites
- Python 3.9 or higher
- OpenAI API key
- Basic familiarity with the OpenAI Agents SDK
Installation
If you haven't already, install the OpenAI Agents SDK:
Creating your first realtime agent
1. Import required components
2. Create a realtime agent
agent=RealtimeAgent(name="Assistant",instructions="You are a helpful voice assistant. Keep your responses conversational and friendly.",)3. Set up the runner
runner=RealtimeRunner(starting_agent=agent,config={"model_settings":{"model_name":"gpt-realtime","voice":"ash","modalities":["audio"],"input_audio_format":"pcm16","output_audio_format":"pcm16","input_audio_transcription":{"model":"gpt-4o-mini-transcribe"},"turn_detection":{"type":"semantic_vad","interrupt_response":True},}})4. Start a session
# Start the sessionsession=awaitrunner.run()asyncwithsession:print("Session started! The agent will stream audio responses in real-time.")# Process eventsasyncforeventinsession:try:ifevent.type=="agent_start":print(f"Agent started:{event.agent.name}")elifevent.type=="agent_end":print(f"Agent ended:{event.agent.name}")elifevent.type=="handoff":print(f"Handoff from{event.from_agent.name} to{event.to_agent.name}")elifevent.type=="tool_start":print(f"Tool started:{event.tool.name}")elifevent.type=="tool_end":print(f"Tool ended:{event.tool.name}; output:{event.output}")elifevent.type=="audio_end":print("Audio ended")elifevent.type=="audio":# Enqueue audio for callback-based playback with metadata# Non-blocking put; queue is unbounded, so drops won’t occur.passelifevent.type=="audio_interrupted":print("Audio interrupted")# Begin graceful fade + flush in the audio callback and rebuild jitter buffer.elifevent.type=="error":print(f"Error:{event.error}")elifevent.type=="history_updated":pass# Skip these frequent eventselifevent.type=="history_added":pass# Skip these frequent eventselifevent.type=="raw_model_event":print(f"Raw model event:{_truncate_str(str(event.data),200)}")else:print(f"Unknown event type:{event.type}")exceptExceptionase:print(f"Error processing event:{_truncate_str(str(e),200)}")def_truncate_str(s:str,max_length:int)->str:iflen(s)>max_length:returns[:max_length]+"..."returnsComplete example
Here's a complete working example:
importasynciofromagents.realtimeimportRealtimeAgent,RealtimeRunnerasyncdefmain():# Create the agentagent=RealtimeAgent(name="Assistant",instructions="You are a helpful voice assistant. Keep responses brief and conversational.",)# Set up the runner with configurationrunner=RealtimeRunner(starting_agent=agent,config={"model_settings":{"model_name":"gpt-realtime","voice":"ash","modalities":["audio"],"input_audio_format":"pcm16","output_audio_format":"pcm16","input_audio_transcription":{"model":"gpt-4o-mini-transcribe"},"turn_detection":{"type":"semantic_vad","interrupt_response":True},}},)# Start the sessionsession=awaitrunner.run()asyncwithsession:print("Session started! The agent will stream audio responses in real-time.")# Process eventsasyncforeventinsession:try:ifevent.type=="agent_start":print(f"Agent started:{event.agent.name}")elifevent.type=="agent_end":print(f"Agent ended:{event.agent.name}")elifevent.type=="handoff":print(f"Handoff from{event.from_agent.name} to{event.to_agent.name}")elifevent.type=="tool_start":print(f"Tool started:{event.tool.name}")elifevent.type=="tool_end":print(f"Tool ended:{event.tool.name}; output:{event.output}")elifevent.type=="audio_end":print("Audio ended")elifevent.type=="audio":# Enqueue audio for callback-based playback with metadata# Non-blocking put; queue is unbounded, so drops won’t occur.passelifevent.type=="audio_interrupted":print("Audio interrupted")# Begin graceful fade + flush in the audio callback and rebuild jitter buffer.elifevent.type=="error":print(f"Error:{event.error}")elifevent.type=="history_updated":pass# Skip these frequent eventselifevent.type=="history_added":pass# Skip these frequent eventselifevent.type=="raw_model_event":print(f"Raw model event:{_truncate_str(str(event.data),200)}")else:print(f"Unknown event type:{event.type}")exceptExceptionase:print(f"Error processing event:{_truncate_str(str(e),200)}")def_truncate_str(s:str,max_length:int)->str:iflen(s)>max_length:returns[:max_length]+"..."returnsif__name__=="__main__":# Run the sessionasyncio.run(main())Configuration options
Model settings
model_name: Choose from available realtime models (e.g.,gpt-realtime)voice: Select voice (alloy,echo,fable,onyx,nova,shimmer)modalities: Enable text or audio (["text"]or["audio"])
Audio settings
input_audio_format: Format for input audio (pcm16,g711_ulaw,g711_alaw)output_audio_format: Format for output audioinput_audio_transcription: Transcription configuration
Turn detection
type: Detection method (server_vad,semantic_vad)threshold: Voice activity threshold (0.0-1.0)silence_duration_ms: Silence duration to detect turn endprefix_padding_ms: Audio padding before speech
Next steps
- Learn more about realtime agents
- Check out working examples in theexamples/realtime folder
- Add tools to your agent
- Implement handoffs between agents
- Set up guardrails for safety
Authentication
Make sure your OpenAI API key is set in your environment:
Or pass it directly when creating the session: