クイックスタート

Realtime エージェントを使うと、OpenAI の Realtime API を用いて AI エージェントとの音声対話が可能になります。本ガイドでは、最初のリアルタイム音声エージェントの作成方法を説明します。

ベータ機能

Realtime エージェントはベータ版です。実装の改善に伴い、破壊的変更が発生する可能性があります。

前提条件

Python 3.9 以上
OpenAI API key
OpenAI Agents SDK の基礎知識

インストール

まだの場合は、OpenAI Agents SDK をインストールします:

pipinstallopenai-agents

最初の Realtime エージェントの作成

1. 必要なコンポーネントのインポート

importasynciofromagents.realtimeimportRealtimeAgent,RealtimeRunner

2. Realtime エージェントの作成

agent=RealtimeAgent(name="Assistant",instructions="You are a helpful voice assistant. Keep your responses conversational and friendly.",)

3. Runner のセットアップ

runner=RealtimeRunner(starting_agent=agent,config={"model_settings":{"model_name":"gpt-realtime","voice":"ash","modalities":["audio"],"input_audio_format":"pcm16","output_audio_format":"pcm16","input_audio_transcription":{"model":"gpt-4o-mini-transcribe"},"turn_detection":{"type":"semantic_vad","interrupt_response":True},}})

4. セッションの開始

# Start the sessionsession=awaitrunner.run()asyncwithsession:print("Session started! The agent will stream audio responses in real-time.")# Process eventsasyncforeventinsession:try:ifevent.type=="agent_start":print(f"Agent started:{event.agent.name}")elifevent.type=="agent_end":print(f"Agent ended:{event.agent.name}")elifevent.type=="handoff":print(f"Handoff from{event.from_agent.name} to{event.to_agent.name}")elifevent.type=="tool_start":print(f"Tool started:{event.tool.name}")elifevent.type=="tool_end":print(f"Tool ended:{event.tool.name}; output:{event.output}")elifevent.type=="audio_end":print("Audio ended")elifevent.type=="audio":# Enqueue audio for callback-based playback with metadata# Non-blocking put; queue is unbounded, so drops won’t occur.passelifevent.type=="audio_interrupted":print("Audio interrupted")# Begin graceful fade + flush in the audio callback and rebuild jitter buffer.elifevent.type=="error":print(f"Error:{event.error}")elifevent.type=="history_updated":pass# Skip these frequent eventselifevent.type=="history_added":pass# Skip these frequent eventselifevent.type=="raw_model_event":print(f"Raw model event:{_truncate_str(str(event.data),200)}")else:print(f"Unknown event type:{event.type}")exceptExceptionase:print(f"Error processing event:{_truncate_str(str(e),200)}")def_truncate_str(s:str,max_length:int)->str:iflen(s)>max_length:returns[:max_length]+"..."returns

完全なサンプル

以下は動作する完全なサンプルです:

importasynciofromagents.realtimeimportRealtimeAgent,RealtimeRunnerasyncdefmain():# Create the agentagent=RealtimeAgent(name="Assistant",instructions="You are a helpful voice assistant. Keep responses brief and conversational.",)# Set up the runner with configurationrunner=RealtimeRunner(starting_agent=agent,config={"model_settings":{"model_name":"gpt-realtime","voice":"ash","modalities":["audio"],"input_audio_format":"pcm16","output_audio_format":"pcm16","input_audio_transcription":{"model":"gpt-4o-mini-transcribe"},"turn_detection":{"type":"semantic_vad","interrupt_response":True},}},)# Start the sessionsession=awaitrunner.run()asyncwithsession:print("Session started! The agent will stream audio responses in real-time.")# Process eventsasyncforeventinsession:try:ifevent.type=="agent_start":print(f"Agent started:{event.agent.name}")elifevent.type=="agent_end":print(f"Agent ended:{event.agent.name}")elifevent.type=="handoff":print(f"Handoff from{event.from_agent.name} to{event.to_agent.name}")elifevent.type=="tool_start":print(f"Tool started:{event.tool.name}")elifevent.type=="tool_end":print(f"Tool ended:{event.tool.name}; output:{event.output}")elifevent.type=="audio_end":print("Audio ended")elifevent.type=="audio":# Enqueue audio for callback-based playback with metadata# Non-blocking put; queue is unbounded, so drops won’t occur.passelifevent.type=="audio_interrupted":print("Audio interrupted")# Begin graceful fade + flush in the audio callback and rebuild jitter buffer.elifevent.type=="error":print(f"Error:{event.error}")elifevent.type=="history_updated":pass# Skip these frequent eventselifevent.type=="history_added":pass# Skip these frequent eventselifevent.type=="raw_model_event":print(f"Raw model event:{_truncate_str(str(event.data),200)}")else:print(f"Unknown event type:{event.type}")exceptExceptionase:print(f"Error processing event:{_truncate_str(str(e),200)}")def_truncate_str(s:str,max_length:int)->str:iflen(s)>max_length:returns[:max_length]+"..."returnsif__name__=="__main__":# Run the sessionasyncio.run(main())

設定オプション

モデル設定

model_name: 利用可能なリアルタイムモデルから選択 (例:gpt-realtime)
voice: 音声の選択 (alloy,echo,fable,onyx,nova,shimmer)
modalities: テキストまたは音声を有効化 (["text"] または["audio"])

音声設定

input_audio_format: 入力音声の形式 (pcm16,g711_ulaw,g711_alaw)
output_audio_format: 出力音声の形式
input_audio_transcription: 文字起こしの設定

ターン検出

type: 検出方法 (server_vad,semantic_vad)
threshold: 音声活動のしきい値 (0.0–1.0)
silence_duration_ms: ターン終了を検知する無音時間
prefix_padding_ms: 発話前の音声パディング

次のステップ

Realtime エージェントの詳細
examples/realtime フォルダにある動作する code examples を確認
エージェントにツールを追加
エージェント間のハンドオフを実装
安全性のためのガードレールを設定

認証

環境に OpenAI API key が設定されていることを確認してください:

exportOPENAI_API_KEY="your-api-key-here"

または、セッション作成時に直接渡します:

session=awaitrunner.run(model_config={"api_key":"your-api-key"})

Movatterモバイル変換