This PR introducesreal-time voice pipeline support for OpenAI’sgpt-4o-realtime-preview model, enabling seamless, low-latency speech-to-speech interactions in the Speect framework. The update brings a modern, streaming audio interface, integrated tool execution, and robust event handling—while maintaining full compatibility with the existing STT/TTS pipeline.

Key Features & Changes

RealtimeVoicePipeline:
- New pipeline for direct, continuous audio-to-audio conversations with OpenAI’s real-time models.
- Handles streaming microphone input and speaker output at 24kHz, as required by the API.
- Supports push-to-talk and half-duplex operation to prevent echo/feedback.
Integrated Tool Calls:
- Tools are registered with the pipeline and executed automatically when the model requests a function call.
- Tool results are sent back to the model using the correct OpenAI Realtime API protocol.
Event Handling & Debugging:
- Full support for all major OpenAI Realtime API events, including:
  - Audio and text deltas
  - Tool call arguments (streamed and completed)
  - Transcription events (conversation.item.input_audio_transcription.delta and.completed)
  - Session and rate limit updates
- Example logs all transcription events for easy debugging of what the model “hears.”
Echo & Feedback Mitigation:
- Implements a buffer window after assistant audio playback to prevent microphone echo from triggering new turns.
- Optionally enables server-side noise/echo reduction viainput_audio_noise_reduction in the session config.
Sample Rate Fixes:
- Ensures both input and output audio are always 24kHz PCM, as required by the OpenAI API (fixes “slow motion” audio bug).
Backwards Compatibility:
- All changes are fully compatible with the existing STT/TTS pipeline and configuration.
- Legacy examples and workflows continue to work without modification.
Documentation & Examples:
- Updateddocs/voice/pipeline.md with new real-time usage, configuration, and troubleshooting sections.
- New example:continuous_realtime_assistant.py demonstrates push-to-talk, tool calls, and event handling.

🛠️ How to Use

Realtime Pipeline:
See the new example and documentation for how to useRealtimeVoicePipeline with your OpenAI API key and tools.
Classic Pipeline:
No changes required—existing STT/TTS flows are unaffected.

Added support for gpt4o-realtime models for Speect to Speech interact…

b205f83

…ions- Added detailed documentation for the new `RealtimeVoicePipeline`, including usage examples and event handling for real-time audio interaction.- Introduced a new example script demonstrating the `RealtimeVoicePipeline` with continuous audio streaming and tool execution.

sharananurag998 force-pushed themain branch 3 times, most recently from8bcb389 tob8899f7Compare

May 7, 2025 11:06

sharananurag998 marked this pull request as draft

May 7, 2025 14:29

feat: Context handling in realtime

692f4fd

sharananurag998 force-pushed themain branch fromb8899f7 to692f4fdCompare

May 9, 2025 05:38

added context to tool input

acebd8e

rm-openai requested a review fromdkundel-openai

May 14, 2025 16:35

Copy link

Contributor

dkundel-openai commentedMay 14, 2025

Thank you so much for the PR@sharananurag998! I'll try to look at the PR later this week. Thank you for your patience

Copy link

Author

sharananurag998 commentedMay 15, 2025

@dkundel-openai @rm-openai

I haven't found a way for native speech-to-speech integration with an agent, but we can define an agent and use it as a tool in the real-time speech pipeline, and it works!

The agent-as-tool approach provides better latency than the STT-TTS-based VoicePipeline.

Also this branch has Juspay specific MCP tool handling changes since we're using the fork as a python dependency, I'll move it to a separate branch so that main can be merged.

@dkundel-openai you can review the new pipeline and let me know of any changes I'll be happy to work on it.

Copy link

EmanueleTribi commentedMay 29, 2025•
edited
Loading

Hi everyone, any news on this pull request or in general timeline to integrate the realtime api? i'm very much interested in using it with the SDK agent and i was wondering if to write my own code or to wait it to be directly integrated. Thanks!
@dkundel-openai @sharananurag998

Labels

None yet

3 participants

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Added support for gpt4o-realtime models for Speect to Speech interactions#659

Are you sure you want to change the base?

Added support for gpt4o-realtime models for Speect to Speech interactions#659

Uh oh!