- Notifications
You must be signed in to change notification settings - Fork251
📦 llm-d v0.4.0 Release Notes
This release of thellm-d repo will capture the release for the entirety of the project, guides, components, and all.
Release Date: 2025-11-26
🧩 Component Summary
| Component | Version | Previous Version | Type |
|---|---|---|---|
| llmd/llm-d-inference-scheduler | v0.4.0-rc.1 | v0.3.1 | Image |
| llm-d-incubation/llm-d-modelservice | v0.3.8 | v0.2.10 | Helm Chart |
| llm-d/llm-d-routing-sidecar | v0.4.0-rc.1 | v0.3.1 | Image |
| llm-d/llm-d-cuda | v0.4.0 | v0.3.1 | Image |
| llm-d/llm-d-aws | v0.4.0 | v0.3.1 | Image |
| llm-d/llm-d-xpu | v0.4.0 | v0.3.1 | Image |
| llm-d/llm-d-cpu | v0.4.0 | v0.3.1 | Image (New) |
| llm-d-incubation/llm-d-infra | v1.3.4 | v1.3.3 | Helm Chart |
| kubernetes-sig/gateway-api-inference-extension | v1.2.0-rc.1 | v1.0.1 | Helm Chart |
| llm-d/llm-d-workload-variant-autoscaler | v0.0.8 | NA (new) | Helm Chart + Image |
🔹 lmd/llm-d-inference-scheduler
- Description: This scheduler that makes optimized routing decisions for inference requests to the llm-d inference framework.
- Diff:v0.3.1 → v0.4.0-rc.1
🔹 llm-d-incubation/llm-d-modelservice
- Description:
modelserviceis a Helm chart that simplifies LLM deployment on llm-d by declaratively managing Kubernetes resources for serving base models. It enables reproducible, scalable, and tunable model deployments through modular presets, and clean integration with llm-d ecosystem components (including vLLM, Gateway API Inference Extension, LeaderWorkerSet). - Diff:v0.2.10 → v0.3.8
🔹 llm-d/llm-d-routing-sidecar
- Description: A reverse proxy redirecting incoming requests to the prefill worker specified in the x-prefiller-host-port HTTP request header.
- Diff:v0.3.1 → v0.4.0-rc.1
🔹 llm-d/llm-d
- Description: A midstreamed image of
vllm-project/vllmfor inferencing, supporting features such as PD disaggregation, KV cache awareness and more. - Diff:v0.3.1 → v0.4.0
- Image Variants: Different image variants of this component:
- XPU:
ghcr.io/llm-d/llm-d-xpu:v0.4.0 - AWS:
ghcr.io/llm-d/llm-d-aws:v0.4.0 - CUDA:
ghcr.io/llm-d/llm-d-cuda:v0.4.0 - CPU:
ghcr.io/llm-d/llm-d-cpu:v0.4.0
- XPU:
🔹 llm-d-incubation/llm-d-infra
- Description: A helm chart for deploying gateway and gateway related infrastructure assets for llm-d.
- Diff:v1.3.3 → v1.3.4
🔹 kubernetes-sig/gateway-api-inference-extension
- Description: A Helm chart to deploy an InferencePool, a corresponding EndpointPicker (epp) deployment, and any other related assets.
- Diff:v1.0.1 → v1.2.0-rc.1
🔹 llm-d/llm-d-workload-variant-autoscaler (New - Experimental)
- Description: [TODO: Add description of the workload variant autoscaler]
- History (new):v0.0.8
- Note: This is an experimental component being included in this release for early testing and feedback.
For more information on any of the component project or versions, please checkout their repos directly. For information on installing and using the new release refer to ourguides. Thank you to all contributors who helped make this happen. Automated release notes will be included below, but it should be noted this only tracks work in the main repo, and does not fully reflect a changelog across the project
What's Changed
- Add umbrella kv cache offloading well-lit path folder structure by@liu-cong in#401
- Correct wide-ep resource requirements. by@liu-cong in#373
- add information about component testing by@Gregory-Pereira in#361
- doc(guides): Introduce standardized recipes for Gateway, InferencePool, and vLLM by@zetxqx in#444
- Fix a broken link in the cpu prefix cache readme by@smarterclayton in#451
- Add more GKE specific workarounds and known issues by@smarterclayton in#419
- Update SIGs documentation to remove outdated schedule details. by@petecheslock in#431
- Update links to deploying vLLM multi-host in stable docs by@smarterclayton in#436
- fix kutomization error and model flag error in cpu offloading. by@zetxqx in#453
- Add GKE B200 readme notes by@smarterclayton in#454
- doc: enrich the prefix-cache-storage vllm cpu native offloading with benchmark results by@zetxqx in#438
- Add CPU for llm-d Inference Scheduling by@ZhengHongming888 in#428
- Add cpu offloading example for GKE + LMCache by@dannawang0221 in#318
- Add tab format for better UX on the website by@liu-cong in#452
- Rename
prefix-cache-storagetotiered-prefix-cacheby@vMaroon in#468 - Remove the dockerfile.gke as it is no longer used by@smarterclayton in#462
- Token credentials fix + vLLM v0.11.1 by@Gregory-Pereira in#456
- guides: Make vLLM log more useful in inference-scheduling by@russellb in#439
- Inference scheduling support for Intel Gaudi accelerator by@poussa in#374
- Add JIT directories and model directories by@smarterclayton in#418
- Use markdown comments for Tabs support on docusaurus by@petecheslock in#474
- Highlight P/D benefits with throughput-interactivity tradeoff by@liu-cong in#472
- add benchmark results lmcache results and tuned epp scorers by@zetxqx in#457
- Add step by step guide for setting up p/d with TPU on GKE by@yangligt2 in#443
- refactor: restructure vllm recipe with base and overlay pattern by@diego-torres in#475
- [Build] Add FI JIT Cache to Image by@robertgshaw2-redhat in#482
- Add instructions to clone git repo and checkout the release by@liu-cong in#477
- Create CPU dockefile for PD and Inference Scheduling by@ZhengHongming888 in#465
- guides/prereq/client-setup/install-deps.sh - increment HELMFILE_VERSION to 1.2.1 by@herbertkb in#492
- docs: Addresses CPU support added in PR#428 by@aneeshkp in#466
- Infra, MS and GAIE bumps + istio change compat by@Gregory-Pereira in#459
- Update release version for cpu offloading guide by@liu-cong in#495
- enable TLS in monitoring for prom by@Gregory-Pereira in#496
- helmfile and supporting artifacts for wva by@clubanderson in#464
- updating LMCACHe to be non fork by@Gregory-Pereira in#501
- component bumps for WVA guide by@Gregory-Pereira in#502
- Build vLLM 0.11.2 + patches for 0.4 by@smarterclayton in#461
- Avoid defining LMCACHE_COMMIT_SHA in multiple places by@terrytangyuan in#503
- WVA guide integration targeting v0.4 by@mamy-CS in#470
- fixing AWS image by@Gregory-Pereira in#506
- remove pre-passing values for VLLM by@Gregory-Pereira in#507
New Contributors
- @zetxqx made their first contribution in#444
- @ZhengHongming888 made their first contribution in#428
- @dannawang0221 made their first contribution in#318
- @russellb made their first contribution in#439
- @poussa made their first contribution in#374
- @yangligt2 made their first contribution in#443
- @diego-torres made their first contribution in#475
- @herbertkb made their first contribution in#492
- @aneeshkp made their first contribution in#466
- @mamy-CS made their first contribution in#470
Full Changelog:v0.3.1...v0.4.0
Assets2
Uh oh!
There was an error while loading.Please reload this page.
1 person reacted