Welcome to llm-d: a Kubernetes-native high-performance distributed LLM inference framework

llm-d is a well-lit path for serving large language models at scale with the fastest time-to-value and competitive performance per dollar. Built on vLLM, Kubernetes, and Inference Gateway, llm-d provides modular solutions for distributed inference with features like KV-cache aware routing and disaggregated serving.

Key Resources

📖 Documentation:llm-d.ai
🏗️ Architecture:llm-d architecture docs
📖 Project Details:PROJECT.md
📦 Releases:GitHub Releases

🤝 How to Contribute

Join the Community

💬 Slack:Join our development discussions atllm-d.slack.com
📧 Google Group: Subscribe tollm-d-contributors for architecture docs and meeting invites
🗓️ Weekly Standup: Wednesdays at 1230 ET -Public Calendar

Contributing Code

Read Guidelines: Review ourCode of Conduct andcontribution process
Sign Commits: All commits requireDCO sign-off (git commit -s)

Ways to Contribute

🐛Bug fixes and small features - Submit PRs directly to component repos
🚀New features with APIs - Requireproject proposals
📚Documentation - Help improve guides and examples
🧪Testing & Benchmarking - Contribute to our test coverage
💡Experimental features - Start inllm-d-incubation org

License:Apache 2.0

PinnedLoading

llm-dllm-dPublic
Achieve state of the art inference performance with modern accelerators on Kubernetes
Shell 2.1k 251
llm-d-inference-schedulerllm-d-inference-schedulerPublic
Inference scheduler for llm-d
Go 107 102
llm-d-kv-cache-managerllm-d-kv-cache-managerPublic
Distributed KV cache coordinator
Go 89 59
llm-d-benchmarkllm-d-benchmarkPublic
llm-d benchmark scripts and tooling
Jupyter Notebook 33 39
llm-d-routing-sidecarllm-d-routing-sidecarPublic
Incubating P/D sidecar for llm-d
Go 16 28

Showing 10 of 11 repositories

llm-d-kv-cache-manager Public
Distributed KV cache coordinator
llm-d/llm-d-kv-cache-manager’s past year of commit activity
Go 89Apache-2.0 59 32 (2 issues need help) 25 UpdatedNov 28, 2025
llm-d-inference-scheduler Public
Inference scheduler for llm-d
llm-d/llm-d-inference-scheduler’s past year of commit activity
Go 107Apache-2.0 102 43 (2 issues need help) 11 UpdatedNov 28, 2025
llm-d Public
Achieve state of the art inference performance with modern accelerators on Kubernetes
llm-d/llm-d’s past year of commit activity
Shell 2,107Apache-2.0 251 65 (6 issues need help) 64 UpdatedNov 28, 2025
llm-d-benchmark Public
llm-d benchmark scripts and tooling
llm-d/llm-d-benchmark’s past year of commit activity
Jupyter Notebook 33Apache-2.0 39 50 (9 issues need help) 6 UpdatedNov 26, 2025
llm-d-inference-sim Public
A light weight vLLM simulator, for mocking out replicas.
llm-d/llm-d-inference-sim’s past year of commit activity
Go 59Apache-2.0 40 11 2 UpdatedNov 24, 2025
llm-d.github.io Public
Website for llm-d: This repository builds the website seen at llm-d.ai
llm-d/llm-d.github.io’s past year of commit activity
JavaScript 11 20 0 2 UpdatedNov 21, 2025
llm-d-routing-sidecar Public
Incubating P/D sidecar for llm-d
llm-d/llm-d-routing-sidecar’s past year of commit activity
Go 16Apache-2.0 28 0 2 UpdatedNov 13, 2025
.github Public
llm-d/.github’s past year of commit activity
0 2 1 0 UpdatedAug 28, 2025
llm-d-deployer Public archive
Helm charts for llm-d
llm-d/llm-d-deployer’s past year of commit activity
Shell 50Apache-2.0 56 34 17 UpdatedJul 22, 2025
llm-d-model-service Public archive
Simplified model deployment on llm-d
llm-d/llm-d-model-service’s past year of commit activity
Go 27Apache-2.0 14 29 6 UpdatedJul 2, 2025