퀵스타트

실시간 에이전트는 OpenAI의 Realtime API를 사용해 AI 에이전트와 음성 대화를 가능하게 합니다. 이 가이드는 첫 실시간 음성 에이전트를 만드는 과정을 안내합니다.

베타 기능

실시간 에이전트는 베타 단계입니다. 구현을 개선하는 동안 일부 호환성 깨짐이 발생할 수 있습니다.

사전 준비 사항

Python 3.9 이상
OpenAI API 키
OpenAI Agents SDK에 대한 기본 이해

설치

아직 설치하지 않았다면 OpenAI Agents SDK를 설치하세요:

pipinstallopenai-agents

첫 실시간 에이전트 만들기

1. 필요한 구성요소 가져오기

importasynciofromagents.realtimeimportRealtimeAgent,RealtimeRunner

2. 실시간 에이전트 생성

agent=RealtimeAgent(name="Assistant",instructions="You are a helpful voice assistant. Keep your responses conversational and friendly.",)

3. 러너 설정

runner=RealtimeRunner(starting_agent=agent,config={"model_settings":{"model_name":"gpt-realtime","voice":"ash","modalities":["audio"],"input_audio_format":"pcm16","output_audio_format":"pcm16","input_audio_transcription":{"model":"gpt-4o-mini-transcribe"},"turn_detection":{"type":"semantic_vad","interrupt_response":True},}})

4. 세션 시작

# Start the sessionsession=awaitrunner.run()asyncwithsession:print("Session started! The agent will stream audio responses in real-time.")# Process eventsasyncforeventinsession:try:ifevent.type=="agent_start":print(f"Agent started:{event.agent.name}")elifevent.type=="agent_end":print(f"Agent ended:{event.agent.name}")elifevent.type=="handoff":print(f"Handoff from{event.from_agent.name} to{event.to_agent.name}")elifevent.type=="tool_start":print(f"Tool started:{event.tool.name}")elifevent.type=="tool_end":print(f"Tool ended:{event.tool.name}; output:{event.output}")elifevent.type=="audio_end":print("Audio ended")elifevent.type=="audio":# Enqueue audio for callback-based playback with metadata# Non-blocking put; queue is unbounded, so drops won’t occur.passelifevent.type=="audio_interrupted":print("Audio interrupted")# Begin graceful fade + flush in the audio callback and rebuild jitter buffer.elifevent.type=="error":print(f"Error:{event.error}")elifevent.type=="history_updated":pass# Skip these frequent eventselifevent.type=="history_added":pass# Skip these frequent eventselifevent.type=="raw_model_event":print(f"Raw model event:{_truncate_str(str(event.data),200)}")else:print(f"Unknown event type:{event.type}")exceptExceptionase:print(f"Error processing event:{_truncate_str(str(e),200)}")def_truncate_str(s:str,max_length:int)->str:iflen(s)>max_length:returns[:max_length]+"..."returns

전체 예제

다음은 동작하는 전체 예제입니다:

importasynciofromagents.realtimeimportRealtimeAgent,RealtimeRunnerasyncdefmain():# Create the agentagent=RealtimeAgent(name="Assistant",instructions="You are a helpful voice assistant. Keep responses brief and conversational.",)# Set up the runner with configurationrunner=RealtimeRunner(starting_agent=agent,config={"model_settings":{"model_name":"gpt-realtime","voice":"ash","modalities":["audio"],"input_audio_format":"pcm16","output_audio_format":"pcm16","input_audio_transcription":{"model":"gpt-4o-mini-transcribe"},"turn_detection":{"type":"semantic_vad","interrupt_response":True},}},)# Start the sessionsession=awaitrunner.run()asyncwithsession:print("Session started! The agent will stream audio responses in real-time.")# Process eventsasyncforeventinsession:try:ifevent.type=="agent_start":print(f"Agent started:{event.agent.name}")elifevent.type=="agent_end":print(f"Agent ended:{event.agent.name}")elifevent.type=="handoff":print(f"Handoff from{event.from_agent.name} to{event.to_agent.name}")elifevent.type=="tool_start":print(f"Tool started:{event.tool.name}")elifevent.type=="tool_end":print(f"Tool ended:{event.tool.name}; output:{event.output}")elifevent.type=="audio_end":print("Audio ended")elifevent.type=="audio":# Enqueue audio for callback-based playback with metadata# Non-blocking put; queue is unbounded, so drops won’t occur.passelifevent.type=="audio_interrupted":print("Audio interrupted")# Begin graceful fade + flush in the audio callback and rebuild jitter buffer.elifevent.type=="error":print(f"Error:{event.error}")elifevent.type=="history_updated":pass# Skip these frequent eventselifevent.type=="history_added":pass# Skip these frequent eventselifevent.type=="raw_model_event":print(f"Raw model event:{_truncate_str(str(event.data),200)}")else:print(f"Unknown event type:{event.type}")exceptExceptionase:print(f"Error processing event:{_truncate_str(str(e),200)}")def_truncate_str(s:str,max_length:int)->str:iflen(s)>max_length:returns[:max_length]+"..."returnsif__name__=="__main__":# Run the sessionasyncio.run(main())

구성 옵션

모델 설정

model_name: 사용 가능한 실시간 모델 선택(예:gpt-realtime)
voice: 음성 선택(alloy,echo,fable,onyx,nova,shimmer)
modalities: 텍스트 또는 오디오 활성화(["text"] 또는["audio"])

오디오 설정

input_audio_format: 입력 오디오 형식(pcm16,g711_ulaw,g711_alaw)
output_audio_format: 출력 오디오 형식
input_audio_transcription: 전사 구성

턴 감지

type: 감지 방식(server_vad,semantic_vad)
threshold: 음성 활동 임계값(0.0-1.0)
silence_duration_ms: 턴 종료 감지를 위한 무음 지속 시간
prefix_padding_ms: 발화 전 오디오 패딩

다음 단계

실시간 에이전트 더 알아보기
examples/realtime 폴더의 동작하는 예제 확인
에이전트에 tools 추가
에이전트 간 핸드오프 구현
안전을 위한 가드레일 설정

인증

환경에 OpenAI API 키가 설정되어 있는지 확인하세요:

exportOPENAI_API_KEY="your-api-key-here"

또는 세션을 생성할 때 직접 전달하세요:

session=awaitrunner.run(model_config={"api_key":"your-api-key"})

Movatterモバイル変換