LlamaEdge
LlamaEdge is the easiest & fastest way to run customizedand fine-tuned LLMs locally or on the edge.
- Lightweight inference apps.
LlamaEdge
is in MBs instead of GBs- Native and GPU accelerated performance
- Supports many GPU and hardware accelerators
- Supports many optimized inference libraries
- Wide selection of AI / LLM models
Installation and Setup
See theinstallation instructions.
Chat models
See ausage example.
from langchain_community.chat_models.llama_edgeimport LlamaEdgeChatService
API Reference:LlamaEdgeChatService