KnowingNothing/mlc-llmPublic

forked frommlc-ai/mlc-llm

NotificationsYou must be signed in to change notification settings
Fork1
Star1

Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.

llm.mlc.ai/docs

License

Apache-2.0 license

1 star 1.7k forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 1,149 Commits
.github		.github
3rdparty		3rdparty
android		android
ci		ci
cmake		cmake
cpp		cpp
docs		docs
examples		examples
ios		ios
python		python
rust		rust
scripts		scripts
site		site
tests		tests
web		web
.clang-format		.clang-format
.gitignore		.gitignore
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
CONTRIBUTORS.md		CONTRIBUTORS.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
version.py		version.py

Repository files navigation

MLC LLM

Documentation |Blog |Discord

MachineLearningCompilation forLargeLanguageModels (MLC LLM) is a high-performance universal deployment solution that allows native deployment of any large language models with native APIs with compiler acceleration. The mission of this project is to enable everyone to develop, optimize and deploy AI models natively on everyone's devices with ML compilation techniques.

Universal deployment. MLC LLM supports the following platforms and hardware:

	AMD GPU	NVIDIA GPU	Apple GPU	Intel GPU
Linux / Win	✅ Vulkan, ROCm	✅ Vulkan, CUDA	N/A	✅ Vulkan
macOS	✅ Metal (dGPU)	N/A	✅ Metal	✅ Metal (iGPU)
Web Browser	✅ WebGPU and WASM
iOS / iPadOS	✅ Metal on Apple A-series GPU
Android	✅ OpenCL on Adreno GPU		✅ OpenCL on Mali GPU

Quick Start

We introduce the quick start examples of chat CLI, Python API and REST server here to use MLC LLM.We use 4-bit quantized 8B Llama-3 model for demonstration purpose.The pre-quantized Llama-3 weights is available athttps://huggingface.co/mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC.You can also try out unquantized Llama-3 model by replacingq4f16_1 toq0f16 in the examples below.Please visit ourdocumentation for detailed quick start and introduction.

Installation

MLC LLM is available viapip.It is always recommended to install it in an isolated conda virtual environment.

To verify the installation, activate your virtual environment, run

python -c"import mlc_llm; print(mlc_llm.__path__)"

You are expected to see the installation path of MLC LLM Python package.

Chat CLI

We can try out the chat CLI in MLC LLM with 4-bit quantized 8B Llama-3 model.

mlc_llm chat HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC

It may take 1-2 minutes for the first time running this command.After waiting, this command launch a chat interface where you can enter your prompt and chat with the model.

You can use the following special commands:/help               print the special commands/exit               quit the cli/stats              print out the latest stats (token/sec)/reset              restart a fresh chat/set [overrides]    override settings in the generation config. For example,                      `/set temperature=0.5;max_gen_len=100;stop=end,stop`                      Note: Separate stop words in the `stop` option with commas (,).Multi-line input: Use escape+enter to start a new line.user: What's the meaning of lifeassistant:What a profound and intriguing question! While there's no one definitive answer, I'd be happy to help you explore some perspectives on the meaning of life.The concept of the meaning of life has been debated and...

Python API

We can run the Llama-3 model with the chat completion Python API of MLC LLM.You can save the code below into a Python file and run it.

frommlc_llmimportLLMEngine# Create enginemodel="HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC"engine=LLMEngine(model)# Run chat completion in OpenAI API.forresponseinengine.chat.completions.create(messages=[{"role":"user","content":"What is the meaning of life?"}],model=model,stream=True,):forchoiceinresponse.choices:print(choice.delta.content,end="",flush=True)print("\n")engine.terminate()

The Python API ofmlc_llm.LLMEngine fully aligns with OpenAI API.You can use LLMEngine in the same way of usingOpenAI's Python packagefor both synchronous and asynchronous generation.

If you would like to do concurrent asynchronous generation, you can usemlc_llm.AsyncLLMEngine instead.

REST Server

We can launch a REST server to serve the 4-bit quantized Llama-3 model for OpenAI chat completion requests.The server has fully OpenAI API completeness.

mlc_llm serve HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC

The server is hooked athttp://127.0.0.1:8000 by default, and you can use--host and--portto set a different host and port.When the server is ready (showingINFO: Uvicorn running on http://127.0.0.1:8000 (Press CTRL+C to quit)),we can open a new shell and send a cURL request via the following command:

curl -X POST \  -H"Content-Type: application/json" \  -d'{        "model": "HF://mlc-ai/Llama-3-8B-Instruct-q4f16_1-MLC",        "messages": [            {"role": "user", "content": "Hello! Our project is MLC LLM. What is the name of our project?"}        ]  }' \  http://127.0.0.1:8000/v1/chat/completions

Universal Deployment APIs

MLC LLM provides multiple sets of APIs across platforms and environments. These include

Citation

Please consider citing our project if you find it useful:

@software{mlc-llm,author ={MLC team},title ={{MLC-LLM}},url ={https://github.com/mlc-ai/mlc-llm},year ={2023}}

The underlying techniques of MLC LLM include:

References (Click to expand)

@inproceedings{tensorir,author ={Feng, Siyuan and Hou, Bohan and Jin, Hongyi and Lin, Wuwei and Shao, Junru and Lai, Ruihang and Ye, Zihao and Zheng, Lianmin and Yu, Cody Hao and Yu, Yong and Chen, Tianqi},title ={TensorIR: An Abstraction for Automatic Tensorized Program Optimization},year ={2023},isbn ={9781450399166},publisher ={Association for Computing Machinery},address ={New York, NY, USA},url ={https://doi.org/10.1145/3575693.3576933},doi ={10.1145/3575693.3576933},booktitle ={Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2},pages ={804–817},numpages ={14},keywords ={Tensor Computation, Machine Learning Compiler, Deep Neural Network},location ={Vancouver, BC, Canada},series ={ASPLOS 2023}}@inproceedings{metaschedule,author ={Shao, Junru and Zhou, Xiyou and Feng, Siyuan and Hou, Bohan and Lai, Ruihang and Jin, Hongyi and Lin, Wuwei and Masuda, Masahiro and Yu, Cody Hao and Chen, Tianqi},booktitle ={Advances in Neural Information Processing Systems},editor ={S. Koyejo and S. Mohamed and A. Agarwal and D. Belgrave and K. Cho and A. Oh},pages ={35783--35796},publisher ={Curran Associates, Inc.},title ={Tensor Program Optimization with Probabilistic Programs},url ={https://proceedings.neurips.cc/paper_files/paper/2022/file/e894eafae43e68b4c8dfdacf742bcbf3-Paper-Conference.pdf},volume ={35},year ={2022}}@inproceedings{tvm,author ={Tianqi Chen and Thierry Moreau and Ziheng Jiang and Lianmin Zheng and Eddie Yan and Haichen Shen and Meghan Cowan and Leyuan Wang and Yuwei Hu and Luis Ceze and Carlos Guestrin and Arvind Krishnamurthy},title ={{TVM}: An Automated {End-to-End} Optimizing Compiler for Deep Learning},booktitle ={13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18)},year ={2018},isbn ={978-1-939133-08-3},address ={Carlsbad, CA},pages ={578--594},url ={https://www.usenix.org/conference/osdi18/presentation/chen},publisher ={USENIX Association},month = oct,}

About

Enable everyone to develop, optimize and deploy AI models natively on everyone's devices.

llm.mlc.ai/docs

Releases

1tags

Packages

No packages published

Languages

Python62.3%
C++30.4%
Swift2.1%
Kotlin2.0%
Rust1.4%
Shell0.5%
Other1.3%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Folders and files

Latest commit

History

Repository files navigation

MLC LLM

Quick Start

Installation

Chat CLI

Python API

REST Server

Universal Deployment APIs

Citation

Links

About

Resources

License

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

License

KnowingNothing/mlc-llm

Folders and files

Latest commit

History

Repository files navigation

MLC LLM

Quick Start

Installation

Chat CLI

Python API

REST Server

Universal Deployment APIs

Citation

Links

About

Resources

License

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages