vLLM Chat

vLLM can be deployed as a server that mimics the OpenAI API protocol. This allows vLLM to be used as a drop-in replacement for applications using OpenAI API. This server can be queried in the same format as OpenAI API.

Overview

This will help you get started with vLLMchat models, which leverages thelangchain-openai package. For detailed documentation of allChatOpenAI features and configurations head to theAPI reference.

Integration details

Class	Package	Local	Serializable	JS support	Package downloads	Package latest
ChatOpenAI	langchain_openai	✅	beta	❌

Model features

Specific model features, such as tool calling, support for multi-modal inputs, support for token-level streaming, etc., will depend on the hosted model.

Setup

See the vLLM docshere.

To access vLLM models through LangChain, you'll need to install thelangchain-openai integration package.

Credentials

Authentication will depend on specifics of the inference server.

To enable automated tracing of your model calls, set yourLangSmith API key:

# os.environ["LANGSMITH_TRACING"] = "true"
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key: ")

Installation

The LangChain vLLM integration can be accessed via thelangchain-openai package:

%pip install-qU langchain-openai

Instantiation

Now we can instantiate our model object and generate chat completions:

from langchain_core.messagesimport HumanMessage, SystemMessage
from langchain_core.prompts.chatimport(
    ChatPromptTemplate,
    HumanMessagePromptTemplate,
    SystemMessagePromptTemplate,
)
from langchain_openaiimport ChatOpenAI

inference_server_url="http://localhost:8000/v1"

llm= ChatOpenAI(
    model="mosaicml/mpt-7b",
    openai_api_key="EMPTY",
    openai_api_base=inference_server_url,
    max_tokens=5,
    temperature=0,
)

Invocation

messages=[
    SystemMessage(
        content="You are a helpful assistant that translates English to Italian."
),
    HumanMessage(
        content="Translate the following sentence from English to Italian: I love programming."
),
]
llm.invoke(messages)

AIMessage(content=' Io amo programmare', additional_kwargs={}, example=False)

Chaining

We canchain our model with a prompt template like so:

from langchain_core.promptsimport ChatPromptTemplate

prompt= ChatPromptTemplate(
[
(
"system",
"You are a helpful assistant that translates {input_language} to {output_language}.",
),
("human","{input}"),
]
)

chain= prompt| llm
chain.invoke(
{
"input_language":"English",
"output_language":"German",
"input":"I love programming.",
}
)

API Reference:ChatPromptTemplate

API reference

For detailed documentation of all features and configurations exposed vialangchain-openai, head to the API reference:https://python.langchain.com/api_reference/openai/chat_models/langchain_openai.chat_models.base.ChatOpenAI.html

Refer to the vLLMdocumentation as well.

Chat modelconceptual guide
Chat modelhow-to guides

Movatterモバイル変換

vLLM Chat

Overview

Integration details

Model features

Setup

Credentials

Installation

Instantiation

Invocation

Chaining

API reference

Related

Movatterモバイル変換

Overview​

Integration details​

Model features​

Setup​

Credentials​

Installation​

Instantiation​

Invocation​

Chaining​

API reference​

Related​

Overview

Integration details

Model features

Setup

Credentials

Installation

Instantiation

Invocation

Chaining

API reference

Related