Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
OurBuilding Ambient Agents with LangGraph course is now available on LangChain Academy!
Open on GitHub

Xorbits Inference (Xinference)

This page demonstrates how to useXinferencewith LangChain.

Xinference is a powerful and versatile library designed to serve LLMs,speech recognition models, and multimodal models, even on your laptop.With Xorbits Inference, you can effortlessly deploy and serve your orstate-of-the-art built-in models using just a single command.

Installation and Setup

Xinference can be installed via pip from PyPI:

pip install "xinference[all]"

LLM

Xinference supports various models compatible with GGML, including chatglm, baichuan, whisper,vicuna, and orca. To view the builtin models, run the command:

xinference list --all

Wrapper for Xinference

You can start a local instance of Xinference by running:

xinference

You can also deploy Xinference in a distributed cluster. To do so, first start an Xinference supervisoron the server you want to run it:

xinference-supervisor -H "${supervisor_host}"

Then, start the Xinference workers on each of the other servers where you want to run them on:

xinference-worker -e "http://${supervisor_host}:9997"

You can also start a local instance of Xinference by running:

xinference

Once Xinference is running, an endpoint will be accessible for model management via CLI orXinference client.

For local deployment, the endpoint will behttp://localhost:9997.

For cluster deployment, the endpoint will be http://${supervisor_host}:9997.

Then, you need to launch a model. You can specify the model names and other attributesincluding model_size_in_billions and quantization. You can use command line interface (CLI) todo it. For example,

xinference launch -n orca -s 3 -q q4_0

A model uid will be returned.

Example usage:

from langchain_community.llmsimport Xinference

llm= Xinference(
server_url="http://0.0.0.0:9997",
model_uid={model_uid}# replace model_uid with the model UID return from launching the model
)

llm(
prompt="Q: where can we visit in the capital of France? A:",
generate_config={"max_tokens":1024,"stream":True},
)

API Reference:Xinference

Usage

For more information and detailed examples, refer to theexample for xinference LLMs

Embeddings

Xinference also supports embedding queries and documents. Seeexample for xinference embeddingsfor a more detailed demo.

Xinference LangChain partner package install

Install the integration package with:

pip install langchain-xinference

Chat Models

from langchain_xinference.chat_modelsimport ChatXinference

LLM

from langchain_xinference.llmsimport Xinference

[8]ページ先頭

©2009-2025 Movatter.jp