ChatXinference

Xinference is a powerful and versatile library designed to serve LLMs,speech recognition models, and multimodal models, even on your laptop. It supports a variety of models compatible with GGML, such as chatglm, baichuan, whisper, vicuna, orca, and many others.

Overview

Integration details

Class	Package	Local	Serializable	[JS support]	Package downloads	Package latest
ChatXinference	langchain-xinference	✅	❌	✅	✅	✅

Model features

Tool calling	Structured output	JSON mode	Image input	Audio input	Video input	Token-level streaming	Native async	Token usage	Logprobs
✅	✅	✅	❌	❌	❌	✅	✅	❌	❌

Setup

InstallXinference through PyPI:

%pip install--upgrade--quiet"xinference[all]"

Deploy Xinference Locally or in a Distributed Cluster.

For local deployment, runxinference.

To deploy Xinference in a cluster, first start an Xinference supervisor using thexinference-supervisor. You can also use the option -p to specify the port and -H to specify the host. The default port is 8080 and the default host is 0.0.0.0.

Then, start the Xinference workers usingxinference-worker on each server you want to run them on.

You can consult the README file fromXinference for more information.

Wrapper

To use Xinference with LangChain, you need to first launch a model. You can use command line interface (CLI) to do so:

%xinference launch-n vicuna-v1.3-f ggmlv3-q q4_0

Model uid: 7167b2b0-2a04-11ee-83f0-d29396a3f064

A model UID is returned for you to use. Now you can use Xinference with LangChain:

Installation

The LangChain Xinference integration lives in thelangchain-xinference package:

%pip install-qU langchain-xinference

Make sure you're using the latest Xinference version for structured outputs.

Instantiation

Now we can instantiate our model object and generate chat completions:

from langchain_xinference.chat_modelsimport ChatXinference

llm= ChatXinference(
    server_url="your_server_url", model_uid="7167b2b0-2a04-11ee-83f0-d29396a3f064"
)

llm.invoke(
"Q: where can we visit in the capital of France?",
    config={"max_tokens":1024},
)

Invocation

from langchain_core.messagesimport HumanMessage, SystemMessage
from langchain_xinference.chat_modelsimport ChatXinference

llm= ChatXinference(
    server_url="your_server_url", model_uid="7167b2b0-2a04-11ee-83f0-d29396a3f064"
)

system_message="You are a helpful assistant that translates English to French. Translate the user sentence."
human_message="I love programming."

llm.invoke([HumanMessage(content=human_message), SystemMessage(content=system_message)])

API Reference:HumanMessage |SystemMessage

Chaining

We canchain our model with a prompt template like so:

from langchain.promptsimport PromptTemplate
from langchain_xinference.chat_modelsimport ChatXinference

prompt= PromptTemplate(
input=["country"], template="Q: where can we visit in the capital of {country}? A:"
)

llm= ChatXinference(
    server_url="your_server_url", model_uid="7167b2b0-2a04-11ee-83f0-d29396a3f064"
)

chain= prompt| llm
chain.invoke(input={"country":"France"})
chain.stream(input={"country":"France"})

API Reference:PromptTemplate

API reference

For detailed documentation of all ChatXinference features and configurations head to the API reference:https://github.com/TheSongg/langchain-xinference

Chat modelconceptual guide
Chat modelhow-to guides

Movatterモバイル変換

ChatXinference

Overview

Integration details

Model features

Setup

Deploy Xinference Locally or in a Distributed Cluster.

Wrapper

Installation

Instantiation

Invocation

Chaining

API reference

Related

Movatterモバイル変換

Overview​

Integration details​

Model features​

Setup​

Deploy Xinference Locally or in a Distributed Cluster.​

Wrapper​

Installation​

Instantiation​

Invocation​

Chaining​

API reference​

Related​

Overview

Integration details

Model features

Setup

Deploy Xinference Locally or in a Distributed Cluster.

Wrapper

Installation

Instantiation

Invocation

Chaining

API reference

Related