Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Supercharge Your LLM with the Fastest KV Cache Layer

License

NotificationsYou must be signed in to change notification settings

LMCache/LMCache

lmcache logo

DocsPyPIPyPI - Python VersionUnit TestsCode QualityIntegration Tests


OpenSSF Best PracticesOpenSSF ScorecardAsk DeepWikiGitHub commit activityPyPI - DownloadsYouTube Channel Views


|Blog|Documentation|Join Slack|Interest Form|Roadmap

🔥NEW: For enterprise-scale deployment of LMCache and vLLM, please check out vLLMProduction Stack. LMCache is also officially supported inllm-d andKServe!

Summary

LMCache is anLLM serving engine extension toreduce TTFT andincrease throughput, especially under long-context scenarios. By storing the KV caches of reusable texts across various locations, including (GPU, CPU DRAM, Local Disk), LMCache reuses the KV caches ofany reused text (not necessarily prefix) inany serving engine instance. Thus, LMCache saves precious GPU cycles and reduces user response delay.

By combining LMCache with vLLM, developers achieve 3-10x delay savings and GPU cycle reduction in many LLM use cases, including multi-round QA and RAG.

performance

Features

  • 🔥 Integration with vLLM v1 with the following features:
    • High performance CPU KVCache offloading
    • Disaggregated prefill
    • P2P KVCache sharing
  • LMCache is supported in thevLLM production stack,llm-d, andKServe
  • Stable support for non-prefix KV caches
  • Storage support as follows:
  • Installation support through pip and latest vLLM

Installation

To use LMCache, simply installlmcache from your package manager, e.g. pip:

pip install lmcache

Works on Linux NVIDIA GPU platform.

Moredetailed installation instructions are available in the docs.

Getting started

The best way to get started is to checkout theQuickstart Examples in the docs.

Documentation

Check out the LMCachedocumentation which is available online.

We also post regularly inLMCache blogs.

Examples

Go hands-on with ourexamples,demonstrating how to address different use cases with LMCache.

Interested in Connecting?

Fill out theinterest form,sign up for our newsletter,join LMCache slack,check out LMCache website, ordrop an email, and our team will reach out to you!

Community meeting

Thecommunity meeting for LMCache is hosted weekly. All are welcome to join!

Meetingsalternate weekly between these two times:

We keep notes from each meeting on thisdocument for summaries of standups, discussion, and action items.

Recordings of meetings are available on theYouTube LMCache channel.

Contributing

We welcome and value all contributions and collaborations. Please check outContributing Guide on how to contribute.

Citation

If you use LMCache for your research, please cite our papers:

@inproceedings{liu2024cachegen,  title={Cachegen: Kv cache compression and streaming for fast large language model serving},  author={Liu, Yuhan and Li, Hanchen and Cheng, Yihua and Ray, Siddhant and Huang, Yuyang and Zhang, Qizheng and Du, Kuntai and Yao, Jiayi and Lu, Shan and Ananthanarayanan, Ganesh and others},  booktitle={Proceedings of the ACM SIGCOMM 2024 Conference},  pages={38--56},  year={2024}}@article{cheng2024large,  title={Do Large Language Models Need a Content Delivery Network?},  author={Cheng, Yihua and Du, Kuntai and Yao, Jiayi and Jiang, Junchen},  journal={arXiv preprint arXiv:2409.13761},  year={2024}}@inproceedings{10.1145/3689031.3696098,  author = {Yao, Jiayi and Li, Hanchen and Liu, Yuhan and Ray, Siddhant and Cheng, Yihua and Zhang, Qizheng and Du, Kuntai and Lu, Shan and Jiang, Junchen},  title = {CacheBlend: Fast Large Language Model Serving for RAG with Cached Knowledge Fusion},  year = {2025},  url = {https://doi.org/10.1145/3689031.3696098},  doi = {10.1145/3689031.3696098},  booktitle = {Proceedings of the Twentieth European Conference on Computer Systems},  pages = {94–109},}

Socials

Linkedin |Twitter |Youtube

License

The LMCache codebase is licensed under Apache License 2.0. See theLICENSE file for details.

About

Supercharge Your LLM with the Fastest KV Cache Layer

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp