Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

EmbeddedLLM: API server for Embedded Device Deployment. Currently support CUDA/OpenVINO/IpexLLM/DirectML/CPU

NotificationsYou must be signed in to change notification settings

EmbeddedLLM/embeddedllm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Run local LLMs on iGPU, APU and CPU (AMD , Intel, and Qualcomm (Coming Soon)). Easiest way to launch OpenAI API Compatible Server on Windows, Linux and MacOS

Support matrixSupported nowUnder DevelopmentOn the roadmap
Model architecturesGemma
Llama *
Mistral +
Phi
PlatformLinux
Windows
Architecturex86
x64
Arm64
Hardware AccelerationCUDA
DirectML
IpexLLM
QNN
ROCm
OpenVINO

* The Llama model architecture supports similar model families such as CodeLlama, Vicuna, Yi, and more.

+ The Mistral model architecture supports similar model families such as Zephyr.

🚀 Latest News

  • [2024/06] Support Phi-3 (mini, small, medium), Phi-3-Vision-Mini, Llama-2, Llama-3, Gemma (v1), Mistral v0.3, Starling-LM, Yi-1.5.
  • [2024/06] Support vision/chat inference on iGPU, APU, CPU and CUDA.

Table Content

Supported Models (Quick Start)

ModelsParametersContext LengthLink
Gemma-2b-Instruct v12B8192EmbeddedLLM/gemma-2b-it-onnx
Llama-2-7b-chat7B4096EmbeddedLLM/llama-2-7b-chat-int4-onnx-directml
Llama-2-13b-chat13B4096EmbeddedLLM/llama-2-13b-chat-int4-onnx-directml
Llama-3-8b-chat8B8192EmbeddedLLM/mistral-7b-instruct-v0.3-onnx
Mistral-7b-v0.3-instruct7B32768EmbeddedLLM/mistral-7b-instruct-v0.3-onnx
Phi-3-mini-4k-instruct-0620243.8B4096EmbeddedLLM/Phi-3-mini-4k-instruct-062024-onnx
Phi3-mini-4k-instruct3.8B4096microsoft/Phi-3-mini-4k-instruct-onnx
Phi3-mini-128k-instruct3.8B128kmicrosoft/Phi-3-mini-128k-instruct-onnx
Phi3-medium-4k-instruct17B4096microsoft/Phi-3-medium-4k-instruct-onnx-directml
Phi3-medium-128k-instruct17B128kmicrosoft/Phi-3-medium-128k-instruct-onnx-directml
Openchat-3.6-8b8B8192EmbeddedLLM/openchat-3.6-8b-20240522-onnx
Yi-1.5-6b-chat6B32kEmbeddedLLM/01-ai_Yi-1.5-6B-Chat-onnx
Phi-3-vision-128k-instruct128kEmbeddedLLM/Phi-3-vision-128k-instruct-onnx

Getting Started

Installation

From Source

  • Windows

    1. Custom Setup:
    • IPEX(XPU): Requires anaconda environment.conda create -n ellm python=3.10 libuv; conda activate ellm.
    • DirectML: If you are using Conda Environment. Install additional dependencies:conda install conda-forge::vs2015_runtime.
    1. Install embeddedllm package.$env:ELLM_TARGET_DEVICE='directml'; pip install -e .. Note: currently supportcpu,directml andcuda.

      • DirectML:$env:ELLM_TARGET_DEVICE='directml'; pip install -e .[directml]
      • CPU:$env:ELLM_TARGET_DEVICE='cpu'; pip install -e .[cpu]
      • CUDA:$env:ELLM_TARGET_DEVICE='cuda'; pip install -e .[cuda]
      • IPEX:$env:ELLM_TARGET_DEVICE='ipex'; python setup.py develop
      • OpenVINO:$env:ELLM_TARGET_DEVICE='openvino'; pip install -e .[openvino]
      • With Web UI:
        • DirectML:$env:ELLM_TARGET_DEVICE='directml'; pip install -e .[directml,webui]
        • CPU:$env:ELLM_TARGET_DEVICE='cpu'; pip install -e .[cpu,webui]
        • CUDA:$env:ELLM_TARGET_DEVICE='cuda'; pip install -e .[cuda,webui]
        • IPEX:$env:ELLM_TARGET_DEVICE='ipex'; python setup.py develop; pip install -r requirements-webui.txt
        • OpenVINO:$env:ELLM_TARGET_DEVICE='openvino'; pip install -e .[openvino,webui]
  • Linux

    1. Custom Setup:
    • IPEX(XPU): Requires anaconda environment.conda create -n ellm python=3.10 libuv; conda activate ellm.
    • DirectML: If you are using Conda Environment. Install additional dependencies:conda install conda-forge::vs2015_runtime.
    1. Install embeddedllm package.ELLM_TARGET_DEVICE='directml' pip install -e .. Note: currently supportcpu,directml andcuda.

      • DirectML:ELLM_TARGET_DEVICE='directml' pip install -e .[directml]
      • CPU:ELLM_TARGET_DEVICE='cpu' pip install -e .[cpu]
      • CUDA:ELLM_TARGET_DEVICE='cuda' pip install -e .[cuda]
      • IPEX:ELLM_TARGET_DEVICE='ipex' python setup.py develop
      • OpenVINO:ELLM_TARGET_DEVICE='openvino' pip install -e .[openvino]
      • With Web UI:
        • DirectML:ELLM_TARGET_DEVICE='directml' pip install -e .[directml,webui]
        • CPU:ELLM_TARGET_DEVICE='cpu' pip install -e .[cpu,webui]
        • CUDA:ELLM_TARGET_DEVICE='cuda' pip install -e .[cuda,webui]
        • IPEX:ELLM_TARGET_DEVICE='ipex' python setup.py develop; pip install -r requirements-webui.txt
        • OpenVINO:ELLM_TARGET_DEVICE='openvino' pip install -e .[openvino,webui]

Launch OpenAI API Compatible Server

  1. Custom Setup:

    • Ipex

      • ForIntel iGPU:

        setSYCL_CACHE_PERSISTENT=1setBIGDL_LLM_XMX_DISABLED=1
      • ForIntel Arc™ A-Series Graphics:

        setSYCL_CACHE_PERSISTENT=1
  2. ellm_server --model_path <path/to/model/weight>.

  3. Example code to connect to the api server can be found inscripts/python.Note: To find out more of the supported arguments.ellm_server --help.

Launch Chatbot Web UI

  1. ellm_chatbot --port 7788 --host localhost --server_port <ellm_server_port> --server_host localhost.Note: To find out more of the supported arguments.ellm_chatbot --help.

asset/ellm_chatbot_vid.webp

Launch Model Management UI

It is an interface that allows you to download and deploy OpenAI API compatible server. You can find out the disk space required to download the model in the UI.

  1. ellm_modelui --port 6678.Note: To find out more of the supported arguments.ellm_modelui --help.

Model Management UI

Compile OpenAI-API Compatible Server into Windows Executable

NOTE: OpenVINO packaging currently usestorch==2.4.0. It will not be able to run due to missing dependencies which islibomp. Make sure to installlibomp and add thelibomp-xxxxxxx.dll toC:\\Windows\\System32.

  1. Installembeddedllm.

  2. Install PyInstaller:pip install pyinstaller==6.9.0.

  3. Compile Windows Executable:pyinstaller .\ellm_api_server.spec.

  4. You can find the executable in thedist\ellm_api_server.

  5. Use it likeellm_server..\ellm_api_server.exe --model_path <path/to/model/weight>.

    Powershell/Terminal Usage:

    ellm_server--model_path<path/to/model/weight># DirectMLellm_server--model_path'EmbeddedLLM/Phi-3-mini-4k-instruct-onnx-directml'--port5555# IPEX-LLMellm_server--model_path'.\meta-llama_Meta-Llama-3.1-8B-Instruct\'--backend'ipex'--device'xpu'--port5555--served_model_name'meta-llama_Meta/Llama-3.1-8B-Instruct'# OpenVINOellm_server--model_path'.\meta-llama_Meta-Llama-3.1-8B-Instruct\'--backend'openvino'--device'gpu'--port5555--served_model_name'meta-llama_Meta/Llama-3.1-8B-Instruct'

Prebuilt OpenAI API Compatible Windows Executable (Alpha)

You can find the prebuilt OpenAI API Compatible Windows Executable in the Release page.

Powershell/Terminal Usage (Use it likeellm_server):

.\ellm_api_server.exe--model_path<path/to/model/weight># DirectML.\ellm_api_server.exe--model_path'EmbeddedLLM_Phi-3-mini-4k-instruct-062024-onnx\onnx\directml\Phi-3-mini-4k-instruct-062024-int4'--port5555# IPEX-LLM.\ellm_api_server.exe--model_path'.\meta-llama_Meta-Llama-3.1-8B-Instruct\'--backend'ipex'--device'xpu'--port5555--served_model_name'meta-llama_Meta/Llama-3.1-8B-Instruct'# OpenVINO.\ellm_api_server.exe--model_path'.\meta-llama_Meta-Llama-3.1-8B-Instruct\'--backend'openvino'--device'gpu'--port5555--served_model_name'meta-llama_Meta/Llama-3.1-8B-Instruct'

Acknowledgements

About

EmbeddedLLM: API server for Embedded Device Deployment. Currently support CUDA/OpenVINO/IpexLLM/DirectML/CPU

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors3

  •  
  •  
  •  

Languages


[8]ページ先頭

©2009-2025 Movatter.jp