Release v0.4.0

Latest

Gregory-Pereira released this 26 Nov 20:19

· 2 commits to main since this release

v0.4.0

04f3538

This commit was created on GitHub.com and signed with GitHub’sverified signature.

GPG key ID:B5690EEEBB952194

Verified

Learn about vigilant mode.

📦 llm-d v0.4.0 Release Notes

This release of thellm-d repo will capture the release for the entirety of the project, guides, components, and all.

Release Date: 2025-11-26

🧩 Component Summary

Component	Version	Previous Version	Type
llmd/llm-d-inference-scheduler	`v0.4.0-rc.1`	`v0.3.1`	Image
llm-d-incubation/llm-d-modelservice	`v0.3.8`	`v0.2.10`	Helm Chart
llm-d/llm-d-routing-sidecar	`v0.4.0-rc.1`	`v0.3.1`	Image
llm-d/llm-d-cuda	`v0.4.0`	`v0.3.1`	Image
llm-d/llm-d-aws	`v0.4.0`	`v0.3.1`	Image
llm-d/llm-d-xpu	`v0.4.0`	`v0.3.1`	Image
llm-d/llm-d-cpu	`v0.4.0`	`v0.3.1`	Image (New)
llm-d-incubation/llm-d-infra	`v1.3.4`	`v1.3.3`	Helm Chart
kubernetes-sig/gateway-api-inference-extension	`v1.2.0-rc.1`	`v1.0.1`	Helm Chart
llm-d/llm-d-workload-variant-autoscaler	`v0.0.8`	NA (new)	Helm Chart + Image

🔹 lmd/llm-d-inference-scheduler

Description: This scheduler that makes optimized routing decisions for inference requests to the llm-d inference framework.
Diff:v0.3.1 → v0.4.0-rc.1

🔹 llm-d-incubation/llm-d-modelservice

Description:modelservice is a Helm chart that simplifies LLM deployment on llm-d by declaratively managing Kubernetes resources for serving base models. It enables reproducible, scalable, and tunable model deployments through modular presets, and clean integration with llm-d ecosystem components (including vLLM, Gateway API Inference Extension, LeaderWorkerSet).
Diff:v0.2.10 → v0.3.8

🔹 llm-d/llm-d-routing-sidecar

Description: A reverse proxy redirecting incoming requests to the prefill worker specified in the x-prefiller-host-port HTTP request header.
Diff:v0.3.1 → v0.4.0-rc.1

🔹 llm-d/llm-d

Description: A midstreamed image ofvllm-project/vllm for inferencing, supporting features such as PD disaggregation, KV cache awareness and more.
Diff:v0.3.1 → v0.4.0
Image Variants: Different image variants of this component:
- XPU:ghcr.io/llm-d/llm-d-xpu:v0.4.0
- AWS:ghcr.io/llm-d/llm-d-aws:v0.4.0
- CUDA:ghcr.io/llm-d/llm-d-cuda:v0.4.0
- CPU:ghcr.io/llm-d/llm-d-cpu:v0.4.0

🔹 llm-d-incubation/llm-d-infra

Description: A helm chart for deploying gateway and gateway related infrastructure assets for llm-d.
Diff:v1.3.3 → v1.3.4

🔹 kubernetes-sig/gateway-api-inference-extension

Description: A Helm chart to deploy an InferencePool, a corresponding EndpointPicker (epp) deployment, and any other related assets.
Diff:v1.0.1 → v1.2.0-rc.1

🔹 llm-d/llm-d-workload-variant-autoscaler (New - Experimental)

Description: [TODO: Add description of the workload variant autoscaler]
History (new):v0.0.8
Note: This is an experimental component being included in this release for early testing and feedback.

For more information on any of the component project or versions, please checkout their repos directly. For information on installing and using the new release refer to ourguides. Thank you to all contributors who helped make this happen. Automated release notes will be included below, but it should be noted this only tracks work in the main repo, and does not fully reflect a changelog across the project

What's Changed

Add umbrella kv cache offloading well-lit path folder structure by@liu-cong in#401
Correct wide-ep resource requirements. by@liu-cong in#373
add information about component testing by@Gregory-Pereira in#361
doc(guides): Introduce standardized recipes for Gateway, InferencePool, and vLLM by@zetxqx in#444
Fix a broken link in the cpu prefix cache readme by@smarterclayton in#451
Add more GKE specific workarounds and known issues by@smarterclayton in#419
Update SIGs documentation to remove outdated schedule details. by@petecheslock in#431
Update links to deploying vLLM multi-host in stable docs by@smarterclayton in#436
fix kutomization error and model flag error in cpu offloading. by@zetxqx in#453
Add GKE B200 readme notes by@smarterclayton in#454
doc: enrich the prefix-cache-storage vllm cpu native offloading with benchmark results by@zetxqx in#438
Add CPU for llm-d Inference Scheduling by@ZhengHongming888 in#428
Add cpu offloading example for GKE + LMCache by@dannawang0221 in#318
Add tab format for better UX on the website by@liu-cong in#452
Renameprefix-cache-storage totiered-prefix-cache by@vMaroon in#468
Remove the dockerfile.gke as it is no longer used by@smarterclayton in#462
Token credentials fix + vLLM v0.11.1 by@Gregory-Pereira in#456
guides: Make vLLM log more useful in inference-scheduling by@russellb in#439
Inference scheduling support for Intel Gaudi accelerator by@poussa in#374
Add JIT directories and model directories by@smarterclayton in#418
Use markdown comments for Tabs support on docusaurus by@petecheslock in#474
Highlight P/D benefits with throughput-interactivity tradeoff by@liu-cong in#472
add benchmark results lmcache results and tuned epp scorers by@zetxqx in#457
Add step by step guide for setting up p/d with TPU on GKE by@yangligt2 in#443
refactor: restructure vllm recipe with base and overlay pattern by@diego-torres in#475
[Build] Add FI JIT Cache to Image by@robertgshaw2-redhat in#482
Add instructions to clone git repo and checkout the release by@liu-cong in#477
Create CPU dockefile for PD and Inference Scheduling by@ZhengHongming888 in#465
guides/prereq/client-setup/install-deps.sh - increment HELMFILE_VERSION to 1.2.1 by@herbertkb in#492
docs: Addresses CPU support added in PR#428 by@aneeshkp in#466
Infra, MS and GAIE bumps + istio change compat by@Gregory-Pereira in#459
Update release version for cpu offloading guide by@liu-cong in#495
enable TLS in monitoring for prom by@Gregory-Pereira in#496
helmfile and supporting artifacts for wva by@clubanderson in#464
updating LMCACHe to be non fork by@Gregory-Pereira in#501
component bumps for WVA guide by@Gregory-Pereira in#502
Build vLLM 0.11.2 + patches for 0.4 by@smarterclayton in#461
Avoid defining LMCACHE_COMMIT_SHA in multiple places by@terrytangyuan in#503
WVA guide integration targeting v0.4 by@mamy-CS in#470
fixing AWS image by@Gregory-Pereira in#506
remove pre-passing values for VLLM by@Gregory-Pereira in#507