Streaming API responses

Learn how to stream model responses from the OpenAI API using server-sent events.

By default, when you make a request to the OpenAI API, we generate the model’s entire output before sending it back in a single HTTP response. When generating long outputs, waiting for a response can take time. Streaming responses lets you start printing or processing the beginning of the model’s output while it continues generating the full response.

Enable streaming

To start streaming responses, setstream=True in your request to the Responses endpoint:

python

1234567891011121314151617import { OpenAI }from"openai";const client =new OpenAI();const stream =await client.responses.create({model:"gpt-5",input: [        {role:"user",content:"Say 'double bubble bath' ten times fast.",        },    ],stream:true,});forawait (const eventof stream) {console.log(event);}

12345678910111213141516from openaiimport OpenAIclient = OpenAI()stream = client.responses.create(    model="gpt-5",input=[        {"role":"user","content":"Say 'double bubble bath' ten times fast.",        },    ],    stream=True,)for eventin stream:print(event)

123456789101112131415161718using OpenAI.Responses;string key = Environment.GetEnvironmentVariable("OPENAI_API_KEY")!;OpenAIResponseClient client =new(model:"gpt-5", apiKey: key);var responses = client.CreateResponseStreamingAsync([    ResponseItem.CreateUserMessageItem([        ResponseContentPart.CreateInputTextPart("Say 'double bubble bath' ten times fast."),    ]),]);awaitforeach (var responsein responses){if (responseis StreamingResponseOutputTextDeltaUpdate delta)    {        Console.Write(delta.Delta);    }}

The Responses API uses semantic events for streaming. Each event is typed with a predefined schema, so you can listen for events you care about.

For a full list of event types, see the API reference for streaming. Here are a few examples:

12345678910111213141516171819202122232425type StreamingEvent =| ResponseCreatedEvent| ResponseInProgressEvent| ResponseFailedEvent| ResponseCompletedEvent| ResponseOutputItemAdded| ResponseOutputItemDone| ResponseContentPartAdded| ResponseContentPartDone| ResponseOutputTextDelta| ResponseOutputTextAnnotationAdded| ResponseTextDone| ResponseRefusalDelta| ResponseRefusalDone| ResponseFunctionCallArgumentsDelta| ResponseFunctionCallArgumentsDone| ResponseFileSearchCallInProgress| ResponseFileSearchCallSearching| ResponseFileSearchCallCompleted| ResponseCodeInterpreterInProgress| ResponseCodeInterpreterCallCodeDelta| ResponseCodeInterpreterCallCodeDone| ResponseCodeInterpreterCallInterpreting| ResponseCodeInterpreterCallCompleted| Error

Streaming Chat Completions is fairly straightforward. However, we recommend using the Responses API for streaming, as we designed it with streaming in mind. The Responses API uses semantic events for streaming and is type-safe.

Stream a chat completion

To stream completions, setstream=True when calling the Chat Completions or legacy Completions endpoints. This returns an object that streams back the response as data-only server-sent events.

The response is sent back incrementally in chunks with an event stream. You can iterate over the event stream with afor loop, like this:

python

12345678910111213141516171819import OpenAIfrom"openai";const openai =new OpenAI();const stream =await openai.chat.completions.create({model:"gpt-5",messages: [        {role:"user",content:"Say 'double bubble bath' ten times fast." ,        }    ],stream:true,});forawait (const chunkof stream) {console.log(chunk);console.log(chunk.choices[0].delta);console.log("****************");}

123456789101112131415161718from openaiimport OpenAIclient = OpenAI()stream = client.chat.completions.create(    model="gpt-5",    messages=[        {"role":"user","content":"Say 'double bubble bath' ten times fast.",        },    ],    stream=True,)for chunkin stream:print(chunk)print(chunk.choices[0].delta)print("****************")

Read the responses

If you’re using our SDK, every event is a typed instance. You can also identity individual events using thetype property of the event.

Some key lifecycle events are emitted only once, while others are emitted multiple times as the response is generated. Common events to listen for when streaming text are:

1234- `response.created`- `response.output_text.delta`- `response.completed`- `error`

For a full list of events you can listen for, see the API reference for streaming.

When you stream a chat completion, the responses has adelta field rather than amessage field. Thedelta field can hold a role token, content token, or nothing.

12345678910111213141516171819202122232425262728{ role: 'assistant', content: '', refusal: null }****************{ content: 'Why' }****************{ content: " don't" }****************{ content: ' scientists' }****************{ content: ' trust' }****************{ content: ' atoms' }****************{ content: '?\n\n' }****************{ content: 'Because' }****************{ content: ' they' }****************{ content: ' make' }****************{ content: ' up' }****************{ content: ' everything' }****************{ content: '!' }****************{}****************

To stream only the text response of your chat completion, your code would like this:

python

1234567891011121314151617import OpenAIfrom"openai";const client =new OpenAI();const stream =await client.chat.completions.create({model:"gpt-5",messages: [        {role:"user",content:"Say 'double bubble bath' ten times fast.",        },    ],stream:true,});forawait (const chunkof stream) {    process.stdout.write(chunk.choices[0]?.delta?.content ||"");}

1234567891011121314151617from openaiimport OpenAIclient = OpenAI()stream = client.chat.completions.create(    model="gpt-5",    messages=[        {"role":"user","content":"Say 'double bubble bath' ten times fast.",        },    ],    stream=True,)for chunkin stream:if chunk.choices[0].delta.contentisnotNone:print(chunk.choices[0].delta.content, end="")

Advanced use cases

For more advanced use cases, like streaming tool calls, check out the following dedicated guides:

Moderation risk

Note that streaming the model’s output in a production application makes it more difficult to moderate the content of the completions, as partial completions may be more difficult to evaluate. This may have implications for approved usage.

Movatterモバイル変換

Get started

Core concepts

Agents

Tools

Run and scale

Evaluation

Realtime API

Model optimization

Specialized models

Going live

Legacy APIs

Resources

Getting Started

Using Codex

Configuration

Administration

Automation

Learn

Releases

Core Concepts

Plan

Build

Deploy

Guides

Resources

Guides

Commerce specs

Product feeds

Categories

Topics

Topics

Contribute

Recent

Topics

Enable streaming

Stream a chat completion

Read the responses

Advanced use cases

Moderation risk