Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Added support for gpt4o-realtime models for Speect to Speech interactions#659

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
sharananurag998 wants to merge3 commits intoopenai:main
base:main
Choose a base branch
Loading
fromsharananurag998:main

Conversation

sharananurag998
Copy link

This PR introducesreal-time voice pipeline support for OpenAI’sgpt-4o-realtime-preview model, enabling seamless, low-latency speech-to-speech interactions in the Speect framework. The update brings a modern, streaming audio interface, integrated tool execution, and robust event handling—while maintaining full compatibility with the existing STT/TTS pipeline.


Key Features & Changes

  • RealtimeVoicePipeline:

    • New pipeline for direct, continuous audio-to-audio conversations with OpenAI’s real-time models.
    • Handles streaming microphone input and speaker output at 24kHz, as required by the API.
    • Supports push-to-talk and half-duplex operation to prevent echo/feedback.
  • Integrated Tool Calls:

    • Tools are registered with the pipeline and executed automatically when the model requests a function call.
    • Tool results are sent back to the model using the correct OpenAI Realtime API protocol.
  • Event Handling & Debugging:

    • Full support for all major OpenAI Realtime API events, including:
      • Audio and text deltas
      • Tool call arguments (streamed and completed)
      • Transcription events (conversation.item.input_audio_transcription.delta and.completed)
      • Session and rate limit updates
    • Example logs all transcription events for easy debugging of what the model “hears.”
  • Echo & Feedback Mitigation:

    • Implements a buffer window after assistant audio playback to prevent microphone echo from triggering new turns.
    • Optionally enables server-side noise/echo reduction viainput_audio_noise_reduction in the session config.
  • Sample Rate Fixes:

    • Ensures both input and output audio are always 24kHz PCM, as required by the OpenAI API (fixes “slow motion” audio bug).
  • Backwards Compatibility:

    • All changes are fully compatible with the existing STT/TTS pipeline and configuration.
    • Legacy examples and workflows continue to work without modification.
  • Documentation & Examples:

    • Updateddocs/voice/pipeline.md with new real-time usage, configuration, and troubleshooting sections.
    • New example:continuous_realtime_assistant.py demonstrates push-to-talk, tool calls, and event handling.

🛠️ How to Use

  • Realtime Pipeline:
    See the new example and documentation for how to useRealtimeVoicePipeline with your OpenAI API key and tools.
  • Classic Pipeline:
    No changes required—existing STT/TTS flows are unaffected.

authman reacted with thumbs up emoji
…ions- Added detailed documentation for the new `RealtimeVoicePipeline`, including usage examples and event handling for real-time audio interaction.- Introduced a new example script demonstrating the `RealtimeVoicePipeline` with continuous audio streaming and tool execution.
@sharananurag998sharananurag998force-pushed themain branch 3 times, most recently from8bcb389 tob8899f7CompareMay 7, 2025 11:06
@sharananurag998sharananurag998 marked this pull request as draftMay 7, 2025 14:29
@dkundel-openai
Copy link
Contributor

Thank you so much for the PR@sharananurag998! I'll try to look at the PR later this week. Thank you for your patience

@sharananurag998
Copy link
Author

@dkundel-openai@rm-openai

I haven't found a way for native speech-to-speech integration with an agent, but we can define an agent and use it as a tool in the real-time speech pipeline, and it works!

The agent-as-tool approach provides better latency than the STT-TTS-based VoicePipeline.

Also this branch has Juspay specific MCP tool handling changes since we're using the fork as a python dependency, I'll move it to a separate branch so that main can be merged.

@dkundel-openai you can review the new pipeline and let me know of any changes I'll be happy to work on it.

@EmanueleTribi
Copy link

EmanueleTribi commentedMay 29, 2025
edited
Loading

Hi everyone, any news on this pull request or in general timeline to integrate the realtime api? i'm very much interested in using it with the SDK agent and i was wondering if to write my own code or to wait it to be directly integrated. Thanks!
@dkundel-openai@sharananurag998

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@dkundel-openaidkundel-openaiAwaiting requested review from dkundel-openai

At least 1 approving review is required to merge this pull request.

Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

3 participants
@sharananurag998@dkundel-openai@EmanueleTribi

[8]ページ先頭

©2009-2025 Movatter.jp