NotificationsYou must be signed in to change notification settings
Fork1
Star40

EmbeddedLLM: API server for Embedded Device Deployment. Currently support CUDA/OpenVINO/IpexLLM/DirectML/CPU

You must be signed in to change notification settings

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
asset		asset
docs/model		docs/model
scripts		scripts
src/embeddedllm		src/embeddedllm
.env.example		.env.example
.flake8		.flake8
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc		.prettierrc
CHANGELOG.md		CHANGELOG.md
README.md		README.md
ellm_api_server.spec		ellm_api_server.spec
pyproject.toml		pyproject.toml
requirements-build.txt		requirements-build.txt
requirements-common.txt		requirements-common.txt
requirements-cpu.txt		requirements-cpu.txt
requirements-cuda.txt		requirements-cuda.txt
requirements-directml.txt		requirements-directml.txt
requirements-ipex.txt		requirements-ipex.txt
requirements-lint.txt		requirements-lint.txt
requirements-openvino.txt		requirements-openvino.txt
requirements-webui.txt		requirements-webui.txt
setup.py		setup.py

Repository files navigation

EmbeddedLLM

Run local LLMs on iGPU, APU and CPU (AMD , Intel, and Qualcomm (Coming Soon)). Easiest way to launch OpenAI API Compatible Server on Windows, Linux and MacOS

Support matrix	Supported now	Under Development	On the roadmap
Model architectures	Gemma Llama * Mistral + Phi
Platform	Linux Windows
Architecture	x86 x64	Arm64
Hardware Acceleration	CUDA DirectML IpexLLM	QNN ROCm	OpenVINO

* The Llama model architecture supports similar model families such as CodeLlama, Vicuna, Yi, and more.

+ The Mistral model architecture supports similar model families such as Zephyr.

🚀 Latest News

[2024/06] Support Phi-3 (mini, small, medium), Phi-3-Vision-Mini, Llama-2, Llama-3, Gemma (v1), Mistral v0.3, Starling-LM, Yi-1.5.
[2024/06] Support vision/chat inference on iGPU, APU, CPU and CUDA.

Table Content

Supported Models (Quick Start)

Models	Parameters	Context Length	Link
Gemma-2b-Instruct v1	2B	8192	EmbeddedLLM/gemma-2b-it-onnx
Llama-2-7b-chat	7B	4096	EmbeddedLLM/llama-2-7b-chat-int4-onnx-directml
Llama-2-13b-chat	13B	4096	EmbeddedLLM/llama-2-13b-chat-int4-onnx-directml
Llama-3-8b-chat	8B	8192	EmbeddedLLM/mistral-7b-instruct-v0.3-onnx
Mistral-7b-v0.3-instruct	7B	32768	EmbeddedLLM/mistral-7b-instruct-v0.3-onnx
Phi-3-mini-4k-instruct-062024	3.8B	4096	EmbeddedLLM/Phi-3-mini-4k-instruct-062024-onnx
Phi3-mini-4k-instruct	3.8B	4096	microsoft/Phi-3-mini-4k-instruct-onnx
Phi3-mini-128k-instruct	3.8B	128k	microsoft/Phi-3-mini-128k-instruct-onnx
Phi3-medium-4k-instruct	17B	4096	microsoft/Phi-3-medium-4k-instruct-onnx-directml
Phi3-medium-128k-instruct	17B	128k	microsoft/Phi-3-medium-128k-instruct-onnx-directml
Openchat-3.6-8b	8B	8192	EmbeddedLLM/openchat-3.6-8b-20240522-onnx
Yi-1.5-6b-chat	6B	32k	EmbeddedLLM/01-ai_Yi-1.5-6B-Chat-onnx
Phi-3-vision-128k-instruct		128k	EmbeddedLLM/Phi-3-vision-128k-instruct-onnx

Getting Started

Installation

From Source

Windows
1. Custom Setup:
- IPEX(XPU): Requires anaconda environment.conda create -n ellm python=3.10 libuv; conda activate ellm.
- DirectML: If you are using Conda Environment. Install additional dependencies:conda install conda-forge::vs2015_runtime.
1. Install embeddedllm package.$env:ELLM_TARGET_DEVICE='directml'; pip install -e .. Note: currently supportcpu,directml andcuda.
  - DirectML:$env:ELLM_TARGET_DEVICE='directml'; pip install -e .[directml]
  - CPU:$env:ELLM_TARGET_DEVICE='cpu'; pip install -e .[cpu]
  - CUDA:$env:ELLM_TARGET_DEVICE='cuda'; pip install -e .[cuda]
  - IPEX:$env:ELLM_TARGET_DEVICE='ipex'; python setup.py develop
  - OpenVINO:$env:ELLM_TARGET_DEVICE='openvino'; pip install -e .[openvino]
  - With Web UI:
    - DirectML:$env:ELLM_TARGET_DEVICE='directml'; pip install -e .[directml,webui]
    - CPU:$env:ELLM_TARGET_DEVICE='cpu'; pip install -e .[cpu,webui]
    - CUDA:$env:ELLM_TARGET_DEVICE='cuda'; pip install -e .[cuda,webui]
    - IPEX:$env:ELLM_TARGET_DEVICE='ipex'; python setup.py develop; pip install -r requirements-webui.txt
    - OpenVINO:$env:ELLM_TARGET_DEVICE='openvino'; pip install -e .[openvino,webui]
Linux
1. Custom Setup:
- IPEX(XPU): Requires anaconda environment.conda create -n ellm python=3.10 libuv; conda activate ellm.
- DirectML: If you are using Conda Environment. Install additional dependencies:conda install conda-forge::vs2015_runtime.
1. Install embeddedllm package.ELLM_TARGET_DEVICE='directml' pip install -e .. Note: currently supportcpu,directml andcuda.
  - DirectML:ELLM_TARGET_DEVICE='directml' pip install -e .[directml]
  - CPU:ELLM_TARGET_DEVICE='cpu' pip install -e .[cpu]
  - CUDA:ELLM_TARGET_DEVICE='cuda' pip install -e .[cuda]
  - IPEX:ELLM_TARGET_DEVICE='ipex' python setup.py develop
  - OpenVINO:ELLM_TARGET_DEVICE='openvino' pip install -e .[openvino]
  - With Web UI:
    - DirectML:ELLM_TARGET_DEVICE='directml' pip install -e .[directml,webui]
    - CPU:ELLM_TARGET_DEVICE='cpu' pip install -e .[cpu,webui]
    - CUDA:ELLM_TARGET_DEVICE='cuda' pip install -e .[cuda,webui]
    - IPEX:ELLM_TARGET_DEVICE='ipex' python setup.py develop; pip install -r requirements-webui.txt
    - OpenVINO:ELLM_TARGET_DEVICE='openvino' pip install -e .[openvino,webui]

Launch OpenAI API Compatible Server

Custom Setup:

Ipex

ForIntel iGPU:

setSYCL_CACHE_PERSISTENT=1setBIGDL_LLM_XMX_DISABLED=1

ForIntel Arc™ A-Series Graphics:
```
setSYCL_CACHE_PERSISTENT=1
```

ellm_server --model_path <path/to/model/weight>.
Example code to connect to the api server can be found inscripts/python.Note: To find out more of the supported arguments.ellm_server --help.

Launch Chatbot Web UI

ellm_chatbot --port 7788 --host localhost --server_port <ellm_server_port> --server_host localhost.Note: To find out more of the supported arguments.ellm_chatbot --help.

Launch Model Management UI

It is an interface that allows you to download and deploy OpenAI API compatible server. You can find out the disk space required to download the model in the UI.

ellm_modelui --port 6678.Note: To find out more of the supported arguments.ellm_modelui --help.

Compile OpenAI-API Compatible Server into Windows Executable

NOTE: OpenVINO packaging currently usestorch==2.4.0. It will not be able to run due to missing dependencies which islibomp. Make sure to installlibomp and add thelibomp-xxxxxxx.dll toC:\\Windows\\System32.

Installembeddedllm.
Install PyInstaller:pip install pyinstaller==6.9.0.
Compile Windows Executable:pyinstaller .\ellm_api_server.spec.
You can find the executable in thedist\ellm_api_server.

Use it likeellm_server..\ellm_api_server.exe --model_path <path/to/model/weight>.

Powershell/Terminal Usage:

ellm_server--model_path<path/to/model/weight># DirectMLellm_server--model_path'EmbeddedLLM/Phi-3-mini-4k-instruct-onnx-directml'--port5555# IPEX-LLMellm_server--model_path'.\meta-llama_Meta-Llama-3.1-8B-Instruct\'--backend'ipex'--device'xpu'--port5555--served_model_name'meta-llama_Meta/Llama-3.1-8B-Instruct'# OpenVINOellm_server--model_path'.\meta-llama_Meta-Llama-3.1-8B-Instruct\'--backend'openvino'--device'gpu'--port5555--served_model_name'meta-llama_Meta/Llama-3.1-8B-Instruct'

Prebuilt OpenAI API Compatible Windows Executable (Alpha)

You can find the prebuilt OpenAI API Compatible Windows Executable in the Release page.

Powershell/Terminal Usage (Use it likeellm_server):

.\ellm_api_server.exe--model_path<path/to/model/weight># DirectML.\ellm_api_server.exe--model_path'EmbeddedLLM_Phi-3-mini-4k-instruct-062024-onnx\onnx\directml\Phi-3-mini-4k-instruct-062024-int4'--port5555# IPEX-LLM.\ellm_api_server.exe--model_path'.\meta-llama_Meta-Llama-3.1-8B-Instruct\'--backend'ipex'--device'xpu'--port5555--served_model_name'meta-llama_Meta/Llama-3.1-8B-Instruct'# OpenVINO.\ellm_api_server.exe--model_path'.\meta-llama_Meta-Llama-3.1-8B-Instruct\'--backend'openvino'--device'gpu'--port5555--served_model_name'meta-llama_Meta/Llama-3.1-8B-Instruct'

Acknowledgements

Excellent open-source projects:vLLM,onnxruntime-genai,Ipex-LLM and many others.

About

EmbeddedLLM: API server for Embedded Device Deployment. Currently support CUDA/OpenVINO/IpexLLM/DirectML/CPU

Releases4

v0.1.0 Latest

Aug 2, 2024

+ 3 releases

Packages

No packages published

Contributors3

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

EmbeddedLLM

🚀 Latest News

Table Content

Supported Models (Quick Start)

Getting Started

Installation

From Source

Launch OpenAI API Compatible Server

Launch Chatbot Web UI

Launch Model Management UI

Compile OpenAI-API Compatible Server into Windows Executable

Prebuilt OpenAI API Compatible Windows Executable (Alpha)

Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases4

Packages

Uh oh!

Contributors3

Uh oh!

Languages

Movatterモバイル変換

EmbeddedLLM/embeddedllm

Folders and files

Latest commit

History

Repository files navigation

EmbeddedLLM

🚀 Latest News

Table Content

Supported Models (Quick Start)

Getting Started

Installation

From Source

Launch OpenAI API Compatible Server

Launch Chatbot Web UI

Launch Model Management UI

Compile OpenAI-API Compatible Server into Windows Executable

Prebuilt OpenAI API Compatible Windows Executable (Alpha)

Acknowledgements

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases4

Packages0

Uh oh!

Contributors3

Uh oh!

Languages

Packages