빠른 시작

사전 준비

Agents SDK의 기본 빠른 시작 안내를 따라 가상 환경을 설정했는지 확인하세요. 그런 다음 SDK의 선택적 음성 종속성을 설치하세요:

pipinstall'openai-agents[voice]'

개념

핵심 개념은VoicePipeline이며, 3단계 프로세스입니다:

음성을 텍스트로 변환하기 위해 음성 인식 모델을 실행합니다.
일반적으로 에이전트 기반 워크플로인 코드를 실행해 결과를 생성합니다.
결과 텍스트를 다시 오디오로 변환하기 위해 음성 합성(TTS) 모델을 실행합니다.

graph LR    %% Input    A["🎤 Audio Input"]    %% Voice Pipeline    subgraph Voice_Pipeline [Voice Pipeline]        direction TB        B["Transcribe (speech-to-text)"]        C["Your Code"]:::highlight        D["Text-to-speech"]        B --> C --> D    end    %% Output    E["🎧 Audio Output"]    %% Flow    A --> Voice_Pipeline    Voice_Pipeline --> E    %% Custom styling    classDef highlight fill:#ffcc66,stroke:#333,stroke-width:1px,font-weight:700;

에이전트

먼저 몇 개의 에이전트를 설정해 보겠습니다. 이 SDK로 에이전트를 만들어 본 경험이 있다면 익숙하게 느껴질 것입니다. 여기서는 두어 개의 에이전트와 핸드오프, 그리고 하나의 도구를 사용합니다.

importasyncioimportrandomfromagentsimport(Agent,function_tool,)fromagents.extensions.handoff_promptimportprompt_with_handoff_instructions@function_tooldefget_weather(city:str)->str:"""Get the weather for a given city."""print(f"[debug] get_weather called with city:{city}")choices=["sunny","cloudy","rainy","snowy"]returnf"The weather in{city} is{random.choice(choices)}."spanish_agent=Agent(name="Spanish",handoff_description="A spanish speaking agent.",instructions=prompt_with_handoff_instructions("You're speaking to a human, so be polite and concise. Speak in Spanish.",),model="gpt-5.2",)agent=Agent(name="Assistant",instructions=prompt_with_handoff_instructions("You're speaking to a human, so be polite and concise. If the user speaks in Spanish, handoff to the spanish agent.",),model="gpt-5.2",handoffs=[spanish_agent],tools=[get_weather],)

음성 파이프라인

워크플로로SingleAgentVoiceWorkflow를 사용하여 간단한 음성 파이프라인을 설정하겠습니다.

fromagents.voiceimportSingleAgentVoiceWorkflow,VoicePipelinepipeline=VoicePipeline(workflow=SingleAgentVoiceWorkflow(agent))

파이프라인 실행

importnumpyasnpimportsounddeviceassdfromagents.voiceimportAudioInput# For simplicity, we'll just create 3 seconds of silence# In reality, you'd get microphone databuffer=np.zeros(24000*3,dtype=np.int16)audio_input=AudioInput(buffer=buffer)result=awaitpipeline.run(audio_input)# Create an audio player using `sounddevice`player=sd.OutputStream(samplerate=24000,channels=1,dtype=np.int16)player.start()# Play the audio stream as it comes inasyncforeventinresult.stream():ifevent.type=="voice_stream_event_audio":player.write(event.data)

통합

importasyncioimportrandomimportnumpyasnpimportsounddeviceassdfromagentsimport(Agent,function_tool,set_tracing_disabled,)fromagents.voiceimport(AudioInput,SingleAgentVoiceWorkflow,VoicePipeline,)fromagents.extensions.handoff_promptimportprompt_with_handoff_instructions@function_tooldefget_weather(city:str)->str:"""Get the weather for a given city."""print(f"[debug] get_weather called with city:{city}")choices=["sunny","cloudy","rainy","snowy"]returnf"The weather in{city} is{random.choice(choices)}."spanish_agent=Agent(name="Spanish",handoff_description="A spanish speaking agent.",instructions=prompt_with_handoff_instructions("You're speaking to a human, so be polite and concise. Speak in Spanish.",),model="gpt-5.2",)agent=Agent(name="Assistant",instructions=prompt_with_handoff_instructions("You're speaking to a human, so be polite and concise. If the user speaks in Spanish, handoff to the spanish agent.",),model="gpt-5.2",handoffs=[spanish_agent],tools=[get_weather],)asyncdefmain():pipeline=VoicePipeline(workflow=SingleAgentVoiceWorkflow(agent))buffer=np.zeros(24000*3,dtype=np.int16)audio_input=AudioInput(buffer=buffer)result=awaitpipeline.run(audio_input)# Create an audio player using `sounddevice`player=sd.OutputStream(samplerate=24000,channels=1,dtype=np.int16)player.start()# Play the audio stream as it comes inasyncforeventinresult.stream():ifevent.type=="voice_stream_event_audio":player.write(event.data)if__name__=="__main__":asyncio.run(main())

이 예제를 실행하면 에이전트가 여러분에게 말합니다. 직접 에이전트와 대화할 수 있는 데모는examples/voice/static에서 확인해 보세요.

Movatterモバイル変換

빠른 시작

사전 준비

개념

에이전트

음성 파이프라인

파이프라인 실행

통합