- Notifications
You must be signed in to change notification settings - Fork284
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
License
ModelTC/LightLLM
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance. LightLLM harnesses the strengths of numerous well-regarded open-source implementations, including but not limited to FasterTransformer, TGI, vLLM, and FlashAttention.
- [2025/09] 🔥 LightLLMv1.1.0 release!
- [2025/08] Pre
$^3$ achieves the outstanding paper award ofACL2025. - [2025/05] LightLLM paper on constrained decoding accepted byACL2025 (Pre
$^3$ : Enabling Deterministic Pushdown Automata for Faster Structured LLM Generation). For a more accessible overview of the research with key insights and examples, check out our blog post:LightLLM Blog - [2025/04] LightLLM paper on request scheduler published inASPLOS’25 (Past-Future Scheduler for LLM Serving under SLA Guarantees)
- [2025/02] 🔥 LightLLM v1.0.0 release, achieving thefastest DeepSeek-R1 serving performance on single H200 machine.
Learn more in the release blogs:v1.0.0 blog.
Please refer to theFAQ for more information.
We welcome any coopoeration and contribution. If there is a project requires LightLLM's support, please contact us via email or create a pull request.
Projects based on LightLLM or referenced LightLLM components:
- LoongServe, Peking University
- vLLM (some LightLLM's kernel used)
- SGLang (some LightLLM's kernel used)
- ParrotServe, Microsoft
- Aphrodite (some LightLLM's kernel used)
- S-LoRA
- OmniKV, Ant Group
- Lab4AI LightLLM+LlamaIndex,Lab4AI LightLLM+Qwen3-8B
- LazyLLM
Also, LightLLM's pure-python design and token-level KC Cache management make it easy to use as the basis for research projects.
Academia works based on or use part of LightLLM:
- ParrotServe (OSDI’24)
- SLoRA (MLSys’24)
- LoongServe (SOSP’24)
- ByteDance’s CXL (Eurosys’24)
- VTC (OSDI’24)
- OmniKV (ICLR’25)
- CaraServe,LoRATEE,FastSwitch ...
For further information and discussion,join our discord server. Welcome to be a member and look forward to your contribution!
This repository is released under theApache-2.0 license.
We learned a lot from the following projects when developing LightLLM.
We have published a number of papers around components or features of LightLLM, if you use LightLLM in your work, please consider citing the relevant paper.
constrained decoding: accepted byACL2025 and achieved the outstanding paper award.
@inproceedings{anonymous2025pre,title={Pre\${\textasciicircum}3\$: Enabling Deterministic Pushdown Automata for Faster Structured {LLM} Generation},author={Anonymous},booktitle={Submitted to ACL Rolling Review - February 2025},year={2025},url={https://openreview.net/forum?id=g1aBeiyZEi},note={under review}}
Request scheduler: accepted byASPLOS’25:
@inproceedings{gong2025past,title={Past-Future Scheduler for LLM Serving under SLA Guarantees},author={Gong, Ruihao and Bai, Shihao and Wu, Siyu and Fan, Yunqian and Wang, Zaijun and Li, Xiuhong and Yang, Hailong and Liu, Xianglong},booktitle={Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2},pages={798--813},year={2025}}
About
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
Topics
Resources
License
Contributing
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.