Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
OurBuilding Ambient Agents with LangGraph course is now available on LangChain Academy!
Open In ColabOpen on GitHub

How to stream responses from an LLM

AllLLMs implement theRunnable interface, which comes withdefault implementations of standard runnable methods (i.e.ainvoke,batch,abatch,stream,astream,astream_events).

Thedefault streaming implementations provide anIterator (orAsyncIterator for asynchronous streaming) that yields a single value: the final output from the underlying chat model provider.

The ability to stream the output token-by-token depends on whether the provider has implemented proper streaming support.

See whichintegrations support token-by-token streaming here.

note

Thedefault implementation doesnot provide support for token-by-token streaming, but it ensures that the model can be swapped in for any other model as it supports the same standard interface.

Sync stream

Below we use a| to help visualize the delimiter between tokens.

from langchain_openaiimport OpenAI

llm= OpenAI(model="gpt-3.5-turbo-instruct", temperature=0, max_tokens=512)
for chunkin llm.stream("Write me a 1 verse song about sparkling water."):
print(chunk, end="|", flush=True)
API Reference:OpenAI


|Spark|ling| water|,| oh| so clear|
|Bubbles dancing|,| without| fear|
|Refreshing| taste|,| a| pure| delight|
|Spark|ling| water|,| my| thirst|'s| delight||

Async streaming

Let's see how to stream in an async setting usingastream.

from langchain_openaiimport OpenAI

llm= OpenAI(model="gpt-3.5-turbo-instruct", temperature=0, max_tokens=512)
asyncfor chunkin llm.astream("Write me a 1 verse song about sparkling water."):
print(chunk, end="|", flush=True)
API Reference:OpenAI


|Spark|ling| water|,| oh| so clear|
|Bubbles dancing|,| without| fear|
|Refreshing| taste|,| a| pure| delight|
|Spark|ling| water|,| my| thirst|'s| delight||

Async event streaming

LLMs also support the standardastream events method.

tip

astream_events is most useful when implementing streaming in a larger LLM application that contains multiple steps (e.g., an application that involves anagent).

from langchain_openaiimport OpenAI

llm= OpenAI(model="gpt-3.5-turbo-instruct", temperature=0, max_tokens=512)

idx=0

asyncfor eventin llm.astream_events(
"Write me a 1 verse song about goldfish on the moon", version="v1"
):
idx+=1
if idx>=5:# Truncate the output
print("...Truncated")
break
print(event)
API Reference:OpenAI

[8]ページ先頭

©2009-2025 Movatter.jp