RunPod Chat Model

Get started with RunPod chat models.

Overview

This guide covers how to use the LangChainChatRunPod class to interact with chat models hosted onRunPod Serverless.

Setup

Install the package:
```
pip install -qU langchain-runpod
```
Deploy a Chat Model Endpoint: Follow the setup steps in theRunPod Provider Guide to deploy a compatible chat model endpoint on RunPod Serverless and get its Endpoint ID.
Set Environment Variables: Make sureRUNPOD_API_KEY andRUNPOD_ENDPOINT_ID (or a specificRUNPOD_CHAT_ENDPOINT_ID) are set.

import getpass
import os

# Make sure environment variables are set (or pass them directly to ChatRunPod)
if"RUNPOD_API_KEY"notin os.environ:
    os.environ["RUNPOD_API_KEY"]= getpass.getpass("Enter your RunPod API Key: ")

if"RUNPOD_ENDPOINT_ID"notin os.environ:
    os.environ["RUNPOD_ENDPOINT_ID"]=input(
"Enter your RunPod Endpoint ID (used if RUNPOD_CHAT_ENDPOINT_ID is not set): "
)

# Optionally use a different endpoint ID specifically for chat models
# if "RUNPOD_CHAT_ENDPOINT_ID" not in os.environ:
#     os.environ["RUNPOD_CHAT_ENDPOINT_ID"] = input("Enter your RunPod Chat Endpoint ID (Optional): ")

chat_endpoint_id= os.environ.get(
"RUNPOD_CHAT_ENDPOINT_ID", os.environ.get("RUNPOD_ENDPOINT_ID")
)
ifnot chat_endpoint_id:
raise ValueError(
"No RunPod Endpoint ID found. Please set RUNPOD_ENDPOINT_ID or RUNPOD_CHAT_ENDPOINT_ID."
)

Instantiation

Initialize theChatRunPod class. You can pass model-specific parameters viamodel_kwargs and configure polling behavior.

from langchain_runpodimport ChatRunPod

chat= ChatRunPod(
    runpod_endpoint_id=chat_endpoint_id,# Specify the correct endpoint ID
    model_kwargs={
"max_new_tokens":512,
"temperature":0.7,
"top_p":0.9,
# Add other parameters supported by your endpoint handler
},
# Optional: Adjust polling
# poll_interval=0.2,
# max_polling_attempts=150
)

Invocation

Use the standard LangChain.invoke() and.ainvoke() methods to call the model. Streaming is also supported via.stream() and.astream() (simulated by polling the RunPod/stream endpoint).

from langchain_core.messagesimport HumanMessage, SystemMessage

messages=[
    SystemMessage(content="You are a helpful AI assistant."),
    HumanMessage(content="What is the RunPod Serverless API flow?"),
]

# Invoke (Sync)
try:
    response= chat.invoke(messages)
print("--- Sync Invoke Response ---")
print(response.content)
except Exceptionas e:
print(
f"Error invoking Chat Model:{e}. Ensure endpoint ID/API key are correct and endpoint is active/compatible."
)

# Stream (Sync, simulated via polling /stream)
print("\n--- Sync Stream Response ---")
try:
for chunkin chat.stream(messages):
print(chunk.content, end="", flush=True)
print()# Newline
except Exceptionas e:
print(
f"\nError streaming Chat Model:{e}. Ensure endpoint handler supports streaming output format."
)

### Async Usage

# AInvoke (Async)
try:
    async_response=await chat.ainvoke(messages)
print("--- Async Invoke Response ---")
print(async_response.content)
except Exceptionas e:
print(f"Error invoking Chat Model asynchronously:{e}.")

# AStream (Async)
print("\n--- Async Stream Response ---")
try:
asyncfor chunkin chat.astream(messages):
print(chunk.content, end="", flush=True)
print()# Newline
except Exceptionas e:
print(
f"\nError streaming Chat Model asynchronously:{e}. Ensure endpoint handler supports streaming output format.\n"
)

API Reference:HumanMessage |SystemMessage

Chaining

The chat model integrates seamlessly with LangChain Expression Language (LCEL) chains.

from langchain_core.output_parsersimport StrOutputParser
from langchain_core.promptsimport ChatPromptTemplate

prompt= ChatPromptTemplate.from_messages(
[
("system","You are a helpful assistant."),
("human","{input}"),
]
)

parser= StrOutputParser()

chain= prompt| chat| parser

try:
    chain_response= chain.invoke(
{"input":"Explain the concept of serverless computing in simple terms."}
)
print("--- Chain Response ---")
print(chain_response)
except Exceptionas e:
print(f"Error running chain:{e}")


# Async chain
try:
    async_chain_response=await chain.ainvoke(
{"input":"What are the benefits of using RunPod for AI/ML workloads?"}
)
print("--- Async Chain Response ---")
print(async_chain_response)
except Exceptionas e:
print(f"Error running async chain:{e}")

API Reference:StrOutputParser |ChatPromptTemplate

Model Features (Endpoint Dependent)

The availability of advanced features dependsheavily on the specific implementation of your RunPod endpoint handler. TheChatRunPod integration provides the basic framework, but the handler must support the underlying functionality.

Feature	Integration Support	Endpoint Dependent?	Notes
Tool calling	❌	✅	Requires handler to process tool definitions and return tool calls (e.g., OpenAI format). Integration needs parsing logic.
Structured output	❌	✅	Requires handler support for forcing structured output (JSON mode, function calling). Integration needs parsing logic.
JSON mode	❌	✅	Requires handler to accept a`json_mode` parameter (or similar) and guarantee JSON output.
Image input	❌	✅	Requires multimodal handler accepting image data (e.g., base64). Integration does not support multimodal messages.
Audio input	❌	✅	Requires handler accepting audio data. Integration does not support audio messages.
Video input	❌	✅	Requires handler accepting video data. Integration does not support video messages.
Token-level streaming	✅ (Simulated)	✅	Polls`/stream`. Requires handler to populate`stream` list in status response with token chunks (e.g.,`[{"output": "token"}]`). True low-latency streaming not built-in.
Native async	✅	✅	Core`ainvoke`/`astream` implemented. Relies on endpoint handler performance.
Token usage	❌	✅	Requires handler to return`prompt_tokens`,`completion_tokens` in the final response. Integration currently does not parse this.
Logprobs	❌	✅	Requires handler to return log probabilities. Integration currently does not parse this.

Key Takeaway: Standard chat invocation and simulated streaming work if the endpoint follows basic RunPod API conventions. Advanced features require specific handler implementations and potentially extending or customizing this integration package.

API reference

For detailed documentation of theChatRunPod class, parameters, and methods, refer to the source code or the generated API reference (if available).

Link to source code:https://github.com/runpod/langchain-runpod/blob/main/langchain_runpod/chat_models.py

Chat modelconceptual guide
Chat modelhow-to guides

Movatterモバイル変換

RunPod Chat Model

Overview

Setup

Instantiation

Invocation

Chaining

Model Features (Endpoint Dependent)

API reference

Related

Movatterモバイル変換

Overview​

Setup​

Instantiation​

Invocation​

Chaining​

Model Features (Endpoint Dependent)​

API reference​

Related​

Overview

Setup

Instantiation

Invocation

Chaining

Model Features (Endpoint Dependent)

API reference

Related