Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Quickstart

Realtime agents enable voice conversations with your AI agents using OpenAI's Realtime API. This guide walks you through creating your first realtime voice agent.

Beta feature

Realtime agents are in beta. Expect some breaking changes as we improve the implementation.

Prerequisites

  • Python 3.9 or higher
  • OpenAI API key
  • Basic familiarity with the OpenAI Agents SDK

Installation

If you haven't already, install the OpenAI Agents SDK:

pipinstallopenai-agents

Creating your first realtime agent

1. Import required components

importasynciofromagents.realtimeimportRealtimeAgent,RealtimeRunner

2. Create a realtime agent

agent=RealtimeAgent(name="Assistant",instructions="You are a helpful voice assistant. Keep your responses conversational and friendly.",)

3. Set up the runner

runner=RealtimeRunner(starting_agent=agent,config={"model_settings":{"model_name":"gpt-realtime","voice":"ash","modalities":["audio"],"input_audio_format":"pcm16","output_audio_format":"pcm16","input_audio_transcription":{"model":"gpt-4o-mini-transcribe"},"turn_detection":{"type":"semantic_vad","interrupt_response":True},}})

4. Start a session

# Start the sessionsession=awaitrunner.run()asyncwithsession:print("Session started! The agent will stream audio responses in real-time.")# Process eventsasyncforeventinsession:try:ifevent.type=="agent_start":print(f"Agent started:{event.agent.name}")elifevent.type=="agent_end":print(f"Agent ended:{event.agent.name}")elifevent.type=="handoff":print(f"Handoff from{event.from_agent.name} to{event.to_agent.name}")elifevent.type=="tool_start":print(f"Tool started:{event.tool.name}")elifevent.type=="tool_end":print(f"Tool ended:{event.tool.name}; output:{event.output}")elifevent.type=="audio_end":print("Audio ended")elifevent.type=="audio":# Enqueue audio for callback-based playback with metadata# Non-blocking put; queue is unbounded, so drops won’t occur.passelifevent.type=="audio_interrupted":print("Audio interrupted")# Begin graceful fade + flush in the audio callback and rebuild jitter buffer.elifevent.type=="error":print(f"Error:{event.error}")elifevent.type=="history_updated":pass# Skip these frequent eventselifevent.type=="history_added":pass# Skip these frequent eventselifevent.type=="raw_model_event":print(f"Raw model event:{_truncate_str(str(event.data),200)}")else:print(f"Unknown event type:{event.type}")exceptExceptionase:print(f"Error processing event:{_truncate_str(str(e),200)}")def_truncate_str(s:str,max_length:int)->str:iflen(s)>max_length:returns[:max_length]+"..."returns

Complete example

Here's a complete working example:

importasynciofromagents.realtimeimportRealtimeAgent,RealtimeRunnerasyncdefmain():# Create the agentagent=RealtimeAgent(name="Assistant",instructions="You are a helpful voice assistant. Keep responses brief and conversational.",)# Set up the runner with configurationrunner=RealtimeRunner(starting_agent=agent,config={"model_settings":{"model_name":"gpt-realtime","voice":"ash","modalities":["audio"],"input_audio_format":"pcm16","output_audio_format":"pcm16","input_audio_transcription":{"model":"gpt-4o-mini-transcribe"},"turn_detection":{"type":"semantic_vad","interrupt_response":True},}},)# Start the sessionsession=awaitrunner.run()asyncwithsession:print("Session started! The agent will stream audio responses in real-time.")# Process eventsasyncforeventinsession:try:ifevent.type=="agent_start":print(f"Agent started:{event.agent.name}")elifevent.type=="agent_end":print(f"Agent ended:{event.agent.name}")elifevent.type=="handoff":print(f"Handoff from{event.from_agent.name} to{event.to_agent.name}")elifevent.type=="tool_start":print(f"Tool started:{event.tool.name}")elifevent.type=="tool_end":print(f"Tool ended:{event.tool.name}; output:{event.output}")elifevent.type=="audio_end":print("Audio ended")elifevent.type=="audio":# Enqueue audio for callback-based playback with metadata# Non-blocking put; queue is unbounded, so drops won’t occur.passelifevent.type=="audio_interrupted":print("Audio interrupted")# Begin graceful fade + flush in the audio callback and rebuild jitter buffer.elifevent.type=="error":print(f"Error:{event.error}")elifevent.type=="history_updated":pass# Skip these frequent eventselifevent.type=="history_added":pass# Skip these frequent eventselifevent.type=="raw_model_event":print(f"Raw model event:{_truncate_str(str(event.data),200)}")else:print(f"Unknown event type:{event.type}")exceptExceptionase:print(f"Error processing event:{_truncate_str(str(e),200)}")def_truncate_str(s:str,max_length:int)->str:iflen(s)>max_length:returns[:max_length]+"..."returnsif__name__=="__main__":# Run the sessionasyncio.run(main())

Configuration options

Model settings

  • model_name: Choose from available realtime models (e.g.,gpt-realtime)
  • voice: Select voice (alloy,echo,fable,onyx,nova,shimmer)
  • modalities: Enable text or audio (["text"] or["audio"])

Audio settings

  • input_audio_format: Format for input audio (pcm16,g711_ulaw,g711_alaw)
  • output_audio_format: Format for output audio
  • input_audio_transcription: Transcription configuration

Turn detection

  • type: Detection method (server_vad,semantic_vad)
  • threshold: Voice activity threshold (0.0-1.0)
  • silence_duration_ms: Silence duration to detect turn end
  • prefix_padding_ms: Audio padding before speech

Next steps

Authentication

Make sure your OpenAI API key is set in your environment:

exportOPENAI_API_KEY="your-api-key-here"

Or pass it directly when creating the session:

session=awaitrunner.run(model_config={"api_key":"your-api-key"})

[8]ページ先頭

©2009-2025 Movatter.jp