- Notifications
You must be signed in to change notification settings - Fork1.6k
Added support for gpt4o-realtime models for Speect to Speech interactions#659
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Uh oh!
There was an error while loading.Please reload this page.
Conversation
…ions- Added detailed documentation for the new `RealtimeVoicePipeline`, including usage examples and event handling for real-time audio interaction.- Introduced a new example script demonstrating the `RealtimeVoicePipeline` with continuous audio streaming and tool execution.
8bcb389
tob8899f7
CompareThank you so much for the PR@sharananurag998! I'll try to look at the PR later this week. Thank you for your patience |
I haven't found a way for native speech-to-speech integration with an agent, but we can define an agent and use it as a tool in the real-time speech pipeline, and it works! The agent-as-tool approach provides better latency than the STT-TTS-based VoicePipeline. Also this branch has Juspay specific MCP tool handling changes since we're using the fork as a python dependency, I'll move it to a separate branch so that main can be merged. @dkundel-openai you can review the new pipeline and let me know of any changes I'll be happy to work on it. |
EmanueleTribi commentedMay 29, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Hi everyone, any news on this pull request or in general timeline to integrate the realtime api? i'm very much interested in using it with the SDK agent and i was wondering if to write my own code or to wait it to be directly integrated. Thanks! |
This PR introducesreal-time voice pipeline support for OpenAI’s
gpt-4o-realtime-preview
model, enabling seamless, low-latency speech-to-speech interactions in the Speect framework. The update brings a modern, streaming audio interface, integrated tool execution, and robust event handling—while maintaining full compatibility with the existing STT/TTS pipeline.Key Features & Changes
RealtimeVoicePipeline:
Integrated Tool Calls:
Event Handling & Debugging:
conversation.item.input_audio_transcription.delta
and.completed
)Echo & Feedback Mitigation:
input_audio_noise_reduction
in the session config.Sample Rate Fixes:
Backwards Compatibility:
Documentation & Examples:
docs/voice/pipeline.md
with new real-time usage, configuration, and troubleshooting sections.continuous_realtime_assistant.py
demonstrates push-to-talk, tool calls, and event handling.🛠️ How to Use
See the new example and documentation for how to use
RealtimeVoicePipeline
with your OpenAI API key and tools.No changes required—existing STT/TTS flows are unaffected.