- Pre-Recorded Audio
- Streaming Audio
- Models and Languages
- Formatting
- Custom Vocabulary
- Media Input Settings
- Results Processing
- Migrating
- Important: Flux Connection Requirements
- Audio Format Requirements
- Configurable Parameters
- End-of-Turn Detection Parameters
- When to Configure These Parameters
- Using Flux: SDK vs Direct WebSocket
- Let’s Build!
- 1. Install the Deepgram SDK
- 2. Add Dependencies
- 3. Install FFMPEG on your machine
- 4. Create a .env file
- 4. Set Imports and Set Audio Stream Colors
- 5. Connect to Flux and Process Audio
- 6. Complete Code Example
- Additional Flux Demos
- Building a Voice Agent with Flux
Getting Started with Flux
Flux is the first conversational speech recognition model built specifically for voice agents. Unlike traditional STT that just transcribes words, Flux understands conversational flow and automatically handles turn-taking.
Flux tackles the most critical challenges for voice agents today: knowing when to listen, when to think, and when to speak. The model features first-of-its-kind model-integrated end-of-turn detection, configurable turn-taking dynamics, and ultra-low latency optimized for voice agent pipelines, all with Nova-3 level accuracy.
Flux is Perfect for: turn-based voice agents, customer service bots, phone assistants, and real-time conversation tools.
Key Benefits:
- Smart turn detection — Knows when speakers finish talking
- Ultra-low latency — ~260ms end-of-turn detection
- Early LLM responses —
EagerEndOfTurnevents for faster replies - Turn-based transcripts — Clean conversation structure
- Natural interruptions — Built-in barge-in handling
- Nova-3 accuracy — Best-in-class transcription quality
For more information on how Flux manages turns, see theFlux State Machine Guide guide.
Important: Flux Connection Requirements
Flux requires the/v2/listen endpoint — Using/v1/listen will not work with Flux.
When connecting to Flux, you must use:
- Endpoint:
/v2/listen(not/v1/listen) - Model:
flux-general-en - Audio Format: SeeAudio Format Requirements table below
- Chunk Size:80ms audio chunks strongly recommended for optimal model performance and latency
Audio Format Requirements
| Audio Type | Encoding | Container | encoding param | sample_rate param | Supported Sample Rates |
|---|---|---|---|---|---|
| Raw | linear16,linear32,mulaw,alaw,opus,ogg-opus | None | Required | Required (16000 recommended) | 8000,16000,24000,44100,48000 |
| Containerized | linear16 | WAV | Omit | Omit | Auto-detected from container |
| Containerized | opus | Ogg | Omit | Omit | Auto-detected from container |
WebSocket URL Format:
wss://api.deepgram.com/v2/listen?model=flux-general-en
When using the Deepgram SDK, useclient.listen.v2.connect() to access the v2 endpoint. For direct WebSocket connections, ensure you’re using/v2/listen in your URL.
Configurable Parameters
Flux provides three key parameters to control end-of-turn detection behavior and optimize your voice agent’s conversational flow:
End-of-Turn Detection Parameters
| Parameter | Range | Default | Description |
|---|---|---|---|
eot_threshold | 0.5 -0.9 | 0.7 | Confidence required to trigger anEndOfTurn event. Higher values = more reliable turn detection but slightly increased latency. |
eager_eot_threshold | 0.3 -0.9 | None | Confidence required to trigger anEagerEndOfTurn event.Required to enable early response generation. Lower values = earlier triggers but more false starts. |
eot_timeout_ms | 500 -10000 | 5000 | Maximum milliseconds of silence before forcing anEndOfTurn, regardless of confidence. |
When to Configure These Parameters
For most use cases, the defaulteot_threshold=0.7 works well. You only need to configure these parameters if:
- You want faster responses: Set
eager_eot_thresholdto enableEagerEndOfTurnevents and start LLM processing before the user fully finishes speaking - Your users speak with long pauses: Increase
eot_timeout_msto avoid cutting off turns prematurely - You need more reliable turn detection: Increase
eot_thresholdto reduce false positives (at the cost of slightly higher latency) - You want more aggressive turn detection: Lower
eot_thresholdto trigger turns earlier
Important: Settingeager_eot_threshold enablesEagerEndOfTurn andTurnResumed events. These events allow you to start preparing LLM responses early, reducing end-to-end latency by hundreds of milliseconds. See theEager End-of-Turn Optimization Guide for implementation strategies.
Cost Consideration: UsingEagerEndOfTurn can increase LLM API calls by 50-70% due to speculative response generation. TheTurnResumed event signals when to cancel a draft response because the user continued speaking.
For comprehensive parameter documentation and tuning guidance, see theEnd-of-Turn Configuration.
Using Flux: SDK vs Direct WebSocket
1 from deepgram import AsyncDeepgramClient 2 3 client = AsyncDeepgramClient() 4 5 # SDK automatically uses /v2/listen endpoint 6 async with client.listen.v2.connect( 7 model="flux-general-en", 8 encoding="linear16", 9 sample_rate="16000" 10 ) as connection: 11 # Your code here 12 pass
Common Mistakes to Avoid:
- ❌ Using
/v1/listeninstead of/v2/listen - ❌ Using
model=fluxinstead ofmodel=flux-general-en - ❌ Using
language=enparameter (usemodel=flux-general-eninstead) - ❌ Specifying
encodingorsample_ratewhen sending containerized audio (omit these for containerized formats)
Let’s Build!
This guide walks you through building a basic streaming transcription application powered by Deepgram Flux and the Deepgram SDK.
By the end of this guide, you’ll have:
- A real-time streaming transcription application with sub-second response times using theBBC Real Time Live Stream as your audio.
- Natural conversation flow with Flux’s advanced turn detection model
- Voice Activity Detection based interruption handling for responsive interactions
- A working demo you can build on!
Audio Stream
To handle the audio stream will be using the following conversion approach:
1. Install the Deepgram SDK
1 # Install the Deepgram Python SDK 2 # https://github.com/deepgram/deepgram-python-sdk 3 pip install deepgram-sdk
2. Add Dependencies
Install the additional dependencies:
1 # Install python-dotenv to protect your API key 2 pip install python-dotenv
3. InstallFFMPEG on your machine
You will need the actual FFmpeg binary installed to run this demo:
- macOS:
brew install ffmpeg - Ubuntu/Debian:
sudo apt install ffmpeg - Windows:
Download from https://ffmpeg.org/
4. Create a.env file
Create a.env file in your project root with your Deepgram API key:
$ touch .env
$ DEEPGRAM_API_KEY="your_deepgram_api_key"
Replaceyour_deepgram_api_key with your actual Deepgram API key.
4. Set Imports and Set Audio Stream Colors
Core Dependencies:
asyncio- Handles concurrent audio streaming and Deepgram connectionsubprocess- Manages FFmpeg process for audio conversiondotenv- Loads Deepgram API key from.envfile
Deepgram SDK:
AsyncDeepgramClient- Main client for Flux API connectionEventType- WebSocket event constants (OPEN, MESSAGE, CLOSE, ERROR)ListenV2SocketClientResponse- Type hints for incoming transcription messages
Configuration:
STREAM_URL- BBC World Service streaming audio endpoint
Visual Feedback System:
Colorsclass - ANSI terminal color codes for confidence visualizationget_confidence_color()- Maps confidence scores to colors:- Green (0.90-1.00): High confidence
- Yellow (0.80-0.90): Good confidence
- Orange (0.70-0.80): Lower confidence
- Red (≤0.69): Low confidence
Purpose: Sets up the foundation for real-time streaming transcription with visual quality indicators, making it easy to spot transcription accuracy at a glance.
1 import asyncio 2 import subprocess 3 from dotenv import load_dotenv 4 5 # Load environment variables from .env file 6 load_dotenv() 7 8 from deepgram import AsyncDeepgramClient 9 from deepgram.core.events import EventType 10 from deepgram.extensions.types.sockets import ListenV2SocketClientResponse 11 12 # URL for the realtime streaming audio to transcribe 13 STREAM_URL = "http://stream.live.vc.bbcmedia.co.uk/bbc_world_service" 14 15 # Terminal color codes 16 class Colors: 17 GREEN = '\033[92m' # 0.90-1.00 18 YELLOW = '\033[93m' # 0.80-0.90 19 ORANGE = '\033[91m' # 0.70-0.80 (using red as orange isn't standard) 20 RED = '\033[31m' # <=0.69 21 RESET = '\033[0m' # Reset to default 22 23 def get_confidence_color(confidence: float) -> str: 24 """Return the appropriate color code based on confidence score""" 25 if confidence >= 0.90: 26 return Colors.GREEN 27 elif confidence >= 0.80: 28 return Colors.YELLOW 29 elif confidence >= 0.70: 30 return Colors.ORANGE 31 else: 32 return Colors.RED
5. Connect to Flux and Process Audio
The main function orchestrates real-time transcription of streaming audio URLs:
- Initialize: Creates
AsyncDeepgramClientand connects to Flux with required linear16 format - Event Handling: Sets up message handler that displays transcriptions with color-coded confidence scores
- Audio Pipeline: Launches FFmpeg subprocess to convert compressed stream URL to
linear16PCM format - Streaming Loop: Reads converted audio chunks and pipes them to Deepgram Flux connection
- Concurrent Tasks: Runs Deepgram listener and audio conversion simultaneously using asyncio
- Error Handling: Manages FFmpeg errors and connection timeouts (60s default)
The function handles both the audio conversion requirement (Flux only acceptslinear16) and real-time streaming coordination between multiple async processes.
1 async def main(): 2 """Main async function to handle URL streaming to Deepgram Flux""" 3 4 # Create the Deepgram async client 5 client = AsyncDeepgramClient() # The API key retrieval happens automatically in the constructor 6 7 try: 8 # Connect to Flux with auto-detection for streaming audio 9 # SDK automatically connects to: wss://api.deepgram.com/v2/listen?model=flux-general-en&encoding=linear16&sample_rate=16000 10 async with client.listen.v2.connect( 11 model="flux-general-en", 12 encoding="linear16", 13 sample_rate="16000" 14 ) as connection: 15 16 # Define message handler function 17 def on_message(message: ListenV2SocketClientResponse) -> None: 18 msg_type = getattr(message, "type", "Unknown") 19 20 # Show transcription results 21 if hasattr(message, 'transcript') and message.transcript: 22 print(f"🎤 {message.transcript}") 23 24 # Show word-level confidence with color coding 25 if hasattr(message, 'words') and message.words: 26 colored_words = [] 27 for word in message.words: 28 color = get_confidence_color(word.confidence) 29 colored_words.append(f"{color}{word.word}({word.confidence:.2f}){Colors.RESET}") 30 words_info = " | ".join(colored_words) 31 print(f" 📝 {words_info}") 32 elif msg_type == "Connected": 33 print(f"✅ Connected to Deepgram Flux - Ready for audio!") 34 35 # Set up event handlers 36 connection.on(EventType.OPEN, lambda _: print("Connection opened")) 37 connection.on(EventType.MESSAGE, on_message) 38 connection.on(EventType.CLOSE, lambda _: print("Connection closed")) 39 connection.on(EventType.ERROR, lambda error: print(f"Caught: {error}")) 40 41 # Start the connection listening in background (it's already async) 42 deepgram_task = asyncio.create_task(connection.start_listening()) 43 44 # Convert BBC stream to linear16 PCM using ffmpeg 45 print(f"Starting to stream and convert audio from: {STREAM_URL}") 46 47 # Use ffmpeg to convert the compressed BBC stream to linear16 PCM at 16kHz 48 ffmpeg_cmd = [ 49 'ffmpeg', 50 '-i', STREAM_URL, # Input: BBC World Service stream 51 '-f', 's16le', # Output format: 16-bit little-endian PCM (linear16) 52 '-ar', '16000', # Sample rate: 16kHz 53 '-ac', '1', # Channels: mono 54 '-' # Output to stdout 55 ] 56 57 try: 58 # Start ffmpeg process 59 process = await asyncio.create_subprocess_exec( 60 *ffmpeg_cmd, 61 stdout=asyncio.subprocess.PIPE, 62 stderr=asyncio.subprocess.PIPE 63 ) 64 65 print(f"✅ Audio conversion started (BBC → linear16 PCM)") 66 67 # Read converted PCM data and send to Deepgram 68 # Note: 1024 bytes = ~32ms of audio at 16kHz linear16 69 # For optimal performance, consider using ~2560 bytes (~80ms at 16kHz) 70 while True: 71 chunk = await process.stdout.read(1024) 72 if not chunk: 73 break 74 75 # Send converted linear16 PCM data to Flux 76 await connection._send(chunk) 77 78 await process.wait() 79 80 except Exception as e: 81 print(f"Error during audio conversion: {e}") 82 if 'process' in locals(): 83 stderr = await process.stderr.read() 84 print(f"FFmpeg error: {stderr.decode()}") 85 86 # Wait for Deepgram task to complete (or cancel after timeout) 87 try: 88 await asyncio.wait_for(deepgram_task, timeout=60) 89 except asyncio.TimeoutError: 90 print("Stream timeout after 60 seconds") 91 deepgram_task.cancel() 92 93 except Exception as e: 94 print(f"Caught: {e}") 95 96 if __name__ == "__main__": 97 asyncio.run(main())
6. Complete Code Example
Here’s the complete working example that combines all the steps. You can also find this code onGitHub.
1 import asyncio 2 import subprocess 3 from dotenv import load_dotenv 4 5 # Load environment variables from .env file 6 load_dotenv() 7 8 from deepgram import AsyncDeepgramClient 9 from deepgram.core.events import EventType 10 from deepgram.extensions.types.sockets import ListenV2SocketClientResponse 11 12 # URL for the realtime streaming audio to transcribe 13 STREAM_URL = "http://stream.live.vc.bbcmedia.co.uk/bbc_world_service" 14 15 # Terminal color codes 16 class Colors: 17 GREEN = '\033[92m' # 0.90-1.00 18 YELLOW = '\033[93m' # 0.80-0.90 19 ORANGE = '\033[91m' # 0.70-0.80 (using red as orange isn't standard) 20 RED = '\033[31m' # <=0.69 21 RESET = '\033[0m' # Reset to default 22 23 def get_confidence_color(confidence: float) -> str: 24 """Return the appropriate color code based on confidence score""" 25 if confidence >= 0.90: 26 return Colors.GREEN 27 elif confidence >= 0.80: 28 return Colors.YELLOW 29 elif confidence >= 0.70: 30 return Colors.ORANGE 31 else: 32 return Colors.RED 33 34 async def main(): 35 """Main async function to handle URL streaming to Deepgram Flux""" 36 37 # Create the Deepgram async client 38 client = AsyncDeepgramClient() 39 40 try: 41 # Connect to Flux with auto-detection for streaming audio 42 # SDK automatically connects to: wss://api.deepgram.com/v2/listen?model=flux-general-en&encoding=linear16&sample_rate=16000 43 async with client.listen.v2.connect( 44 model="flux-general-en", 45 encoding="linear16", 46 sample_rate="16000" 47 ) as connection: 48 49 # Define message handler function 50 def on_message(message: ListenV2SocketClientResponse) -> None: 51 msg_type = getattr(message, "type", "Unknown") 52 53 # Show transcription results 54 if hasattr(message, 'transcript') and message.transcript: 55 print(f"🎤 {message.transcript}") 56 57 # Show word-level confidence with color coding 58 if hasattr(message, 'words') and message.words: 59 colored_words = [] 60 for word in message.words: 61 color = get_confidence_color(word.confidence) 62 colored_words.append(f"{color}{word.word}({word.confidence:.2f}){Colors.RESET}") 63 words_info = " | ".join(colored_words) 64 print(f" 📝 {words_info}") 65 elif msg_type == "Connected": 66 print(f"✅ Connected to Deepgram Flux - Ready for audio!") 67 68 # Set up event handlers 69 connection.on(EventType.OPEN, lambda _: print("Connection opened")) 70 connection.on(EventType.MESSAGE, on_message) 71 connection.on(EventType.CLOSE, lambda _: print("Connection closed")) 72 connection.on(EventType.ERROR, lambda error: print(f"Caught: {error}")) 73 74 # Start the connection listening in background (it's already async) 75 deepgram_task = asyncio.create_task(connection.start_listening()) 76 77 # Convert BBC stream to linear16 PCM using ffmpeg 78 print(f"Starting to stream and convert audio from: {STREAM_URL}") 79 80 # Use ffmpeg to convert the compressed BBC stream to linear16 PCM at 16kHz 81 ffmpeg_cmd = [ 82 'ffmpeg', 83 '-i', STREAM_URL, # Input: BBC World Service stream 84 '-f', 's16le', # Output format: 16-bit little-endian PCM (linear16) 85 '-ar', '16000', # Sample rate: 16kHz 86 '-ac', '1', # Channels: mono 87 '-' # Output to stdout 88 ] 89 90 try: 91 # Start ffmpeg process 92 process = await asyncio.create_subprocess_exec( 93 *ffmpeg_cmd, 94 stdout=asyncio.subprocess.PIPE, 95 stderr=asyncio.subprocess.PIPE 96 ) 97 98 print(f"✅ Audio conversion started (BBC → linear16 PCM)") 99 100 # Read converted PCM data and send to Deepgram 101 # Note: 1024 bytes = ~32ms of audio at 16kHz linear16 102 # For optimal performance, consider using ~2560 bytes (~80ms at 16kHz) 103 while True: 104 chunk = await process.stdout.read(1024) 105 if not chunk: 106 break 107 108 # Send converted linear16 PCM data to Flux 109 await connection._send(chunk) 110 111 await process.wait() 112 113 except Exception as e: 114 print(f"Error during audio conversion: {e}") 115 if 'process' in locals(): 116 stderr = await process.stderr.read() 117 print(f"FFmpeg error: {stderr.decode()}") 118 119 # Wait for Deepgram task to complete (or cancel after timeout) 120 try: 121 await asyncio.wait_for(deepgram_task, timeout=60) 122 except asyncio.TimeoutError: 123 print("Stream timeout after 60 seconds") 124 deepgram_task.cancel() 125 126 except Exception as e: 127 print(f"Caught: {e}") 128 129 if __name__ == "__main__": 130 asyncio.run(main())
Additional Flux Demos
For additional demos showcasing Flux, check out the following repositories:
| Demo Link | Repository | Tech Stack | Use Case |
|---|---|---|---|
| Demo Link | Repository | Node, JS, HTML, CSS | Flux Streaming Transcription |
| N/A | Repository | Rust | Flux Streaming Transcription |
Building a Voice Agent with Flux
Are you ready to build a voice agent with Flux? See ourBuild a Flux-enabled Voice Agent Guide to get started.