Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Quickstart

Prerequisites

Make sure you've followed the basequickstart instructions for the Agents SDK, and set up a virtual environment. Then, install the optional voice dependencies from the SDK:

pipinstall'openai-agents[voice]'

Concepts

The main concept to know about is aVoicePipeline, which is a 3 step process:

  1. Run a speech-to-text model to turn audio into text.
  2. Run your code, which is usually an agentic workflow, to produce a result.
  3. Run a text-to-speech model to turn the result text back into audio.
graph LR    %% Input    A["🎤 Audio Input"]    %% Voice Pipeline    subgraph Voice_Pipeline [Voice Pipeline]        direction TB        B["Transcribe (speech-to-text)"]        C["Your Code"]:::highlight        D["Text-to-speech"]        B --> C --> D    end    %% Output    E["🎧 Audio Output"]    %% Flow    A --> Voice_Pipeline    Voice_Pipeline --> E    %% Custom styling    classDef highlight fill:#ffcc66,stroke:#333,stroke-width:1px,font-weight:700;

Agents

First, let's set up some Agents. This should feel familiar to you if you've built any agents with this SDK. We'll have a couple of Agents, a handoff, and a tool.

importasyncioimportrandomfromagentsimport(Agent,function_tool,)fromagents.extensions.handoff_promptimportprompt_with_handoff_instructions@function_tooldefget_weather(city:str)->str:"""Get the weather for a given city."""print(f"[debug] get_weather called with city:{city}")choices=["sunny","cloudy","rainy","snowy"]returnf"The weather in{city} is{random.choice(choices)}."spanish_agent=Agent(name="Spanish",handoff_description="A spanish speaking agent.",instructions=prompt_with_handoff_instructions("You're speaking to a human, so be polite and concise. Speak in Spanish.",),model="gpt-5.2",)agent=Agent(name="Assistant",instructions=prompt_with_handoff_instructions("You're speaking to a human, so be polite and concise. If the user speaks in Spanish, handoff to the spanish agent.",),model="gpt-5.2",handoffs=[spanish_agent],tools=[get_weather],)

Voice pipeline

We'll set up a simple voice pipeline, usingSingleAgentVoiceWorkflow as the workflow.

fromagents.voiceimportSingleAgentVoiceWorkflow,VoicePipelinepipeline=VoicePipeline(workflow=SingleAgentVoiceWorkflow(agent))

Run the pipeline

importnumpyasnpimportsounddeviceassdfromagents.voiceimportAudioInput# For simplicity, we'll just create 3 seconds of silence# In reality, you'd get microphone databuffer=np.zeros(24000*3,dtype=np.int16)audio_input=AudioInput(buffer=buffer)result=awaitpipeline.run(audio_input)# Create an audio player using `sounddevice`player=sd.OutputStream(samplerate=24000,channels=1,dtype=np.int16)player.start()# Play the audio stream as it comes inasyncforeventinresult.stream():ifevent.type=="voice_stream_event_audio":player.write(event.data)

Put it all together

importasyncioimportrandomimportnumpyasnpimportsounddeviceassdfromagentsimport(Agent,function_tool,set_tracing_disabled,)fromagents.voiceimport(AudioInput,SingleAgentVoiceWorkflow,VoicePipeline,)fromagents.extensions.handoff_promptimportprompt_with_handoff_instructions@function_tooldefget_weather(city:str)->str:"""Get the weather for a given city."""print(f"[debug] get_weather called with city:{city}")choices=["sunny","cloudy","rainy","snowy"]returnf"The weather in{city} is{random.choice(choices)}."spanish_agent=Agent(name="Spanish",handoff_description="A spanish speaking agent.",instructions=prompt_with_handoff_instructions("You're speaking to a human, so be polite and concise. Speak in Spanish.",),model="gpt-5.2",)agent=Agent(name="Assistant",instructions=prompt_with_handoff_instructions("You're speaking to a human, so be polite and concise. If the user speaks in Spanish, handoff to the spanish agent.",),model="gpt-5.2",handoffs=[spanish_agent],tools=[get_weather],)asyncdefmain():pipeline=VoicePipeline(workflow=SingleAgentVoiceWorkflow(agent))buffer=np.zeros(24000*3,dtype=np.int16)audio_input=AudioInput(buffer=buffer)result=awaitpipeline.run(audio_input)# Create an audio player using `sounddevice`player=sd.OutputStream(samplerate=24000,channels=1,dtype=np.int16)player.start()# Play the audio stream as it comes inasyncforeventinresult.stream():ifevent.type=="voice_stream_event_audio":player.write(event.data)if__name__=="__main__":asyncio.run(main())

If you run this example, the agent will speak to you! Check out the example inexamples/voice/static to see a demo where you can speak to the agent yourself.


[8]ページ先頭

©2009-2025 Movatter.jp