llm-d
llm-d enables high performance distributed inference in production on Kubernetes

llm-d is a well-lit path for serving large language models at scale with the fastest time-to-value and competitive performance per dollar. Built on vLLM, Kubernetes, and Inference Gateway, llm-d provides modular solutions for distributed inference with features like KV-cache aware routing and disaggregated serving.
- 📖 Documentation:llm-d.ai
- 🏗️ Architecture:llm-d architecture docs
- 📖 Project Details:PROJECT.md
- 📦 Releases:GitHub Releases
- 💬 Slack:Join our development discussions atllm-d.slack.com
- 📧 Google Group: Subscribe tollm-d-contributors for architecture docs and meeting invites
- 🗓️ Weekly Standup: Wednesdays at 1230 ET -Public Calendar
- Read Guidelines: Review ourCode of Conduct andcontribution process
- Sign Commits: All commits requireDCO sign-off (
git commit -s)
- 🐛Bug fixes and small features - Submit PRs directly to component repos
- 🚀New features with APIs - Requireproject proposals
- 📚Documentation - Help improve guides and examples
- 🧪Testing & Benchmarking - Contribute to our test coverage
- 💡Experimental features - Start inllm-d-incubation org
License:Apache 2.0
PinnedLoading
Repositories
Showing 10 of 11 repositories
Uh oh!
There was an error while loading.Please reload this page.
llm-d/llm-d.github.io’s past year of commit activity Uh oh!
There was an error while loading.Please reload this page.
llm-d/llm-d-routing-sidecar’s past year of commit activity - .github Public
Uh oh!
There was an error while loading.Please reload this page.
llm-d/.github’s past year of commit activity