Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
OurBuilding Ambient Agents with LangGraph course is now available on LangChain Academy!
Open In ColabOpen on GitHub

LlamaEdge

LlamaEdge allows you to chat with LLMs ofGGUF format both locally and via chat service.

  • LlamaEdgeChatService provides developers an OpenAI API compatible service to chat with LLMs via HTTP requests.

  • LlamaEdgeChatLocal enables developers to chat with LLMs locally (coming soon).

BothLlamaEdgeChatService andLlamaEdgeChatLocal run on the infrastructure driven byWasmEdge Runtime, which provides a lightweight and portable WebAssembly container environment for LLM inference tasks.

Chat via API Service

LlamaEdgeChatService works on thellama-api-server. Following the steps inllama-api-server quick-start, you can host your own API service so that you can chat with any models you like on any device you have anywhere as long as the internet is available.

from langchain_community.chat_models.llama_edgeimport LlamaEdgeChatService
from langchain_core.messagesimport HumanMessage, SystemMessage

Chat with LLMs in the non-streaming mode

# service url
service_url="https://b008-54-186-154-209.ngrok-free.app"

# create wasm-chat service instance
chat= LlamaEdgeChatService(service_url=service_url)

# create message sequence
system_message= SystemMessage(content="You are an AI assistant")
user_message= HumanMessage(content="What is the capital of France?")
messages=[system_message, user_message]

# chat with wasm-chat service
response= chat.invoke(messages)

print(f"[Bot]{response.content}")
[Bot] Hello! The capital of France is Paris.

Chat with LLMs in the streaming mode

# service url
service_url="https://b008-54-186-154-209.ngrok-free.app"

# create wasm-chat service instance
chat= LlamaEdgeChatService(service_url=service_url, streaming=True)

# create message sequence
system_message= SystemMessage(content="You are an AI assistant")
user_message= HumanMessage(content="What is the capital of Norway?")
messages=[
system_message,
user_message,
]

output=""
for chunkin chat.stream(messages):
# print(chunk.content, end="", flush=True)
output+= chunk.content

print(f"[Bot]{output}")
[Bot]   Hello! I'm happy to help you with your question. The capital of Norway is Oslo.

Related


[8]ページ先頭

©2009-2025 Movatter.jp