Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Intelligent Router for Mixture-of-Models

License

NotificationsYou must be signed in to change notification settings

vllm-project/semantic-router

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Latest News 🔥


Innovations ✨

architecture

Intelligent Routing 🧠

Auto-Selection of Models and LoRA Adapters

AMixture-of-Models (MoM) router that intelligently directs OpenAI API requests to the most suitable models or LoRA adapters from a defined pool based onSemantic Understanding of the request's intent (Complexity, Task, Tools).

mom-overview

Conceptually similar to Mixture-of-Experts (MoE) which liveswithin a model, this system selects the bestentire model for the nature of the task.

As such, the overall inference accuracy is improved by using a pool of models that are better suited for different types of tasks:

Model Accuracy

The router is implemented in two ways:

  • Golang (with Rust FFI based on thecandle rust ML framework)
  • PythonBenchmarking will be conducted to determine the best implementation.

Request Flow

architecture

Auto-Selection of Tools

Select the tools to use based on the prompt, avoiding the use of tools that are not relevant to the prompt so as to reduce the number of prompt tokens and improve tool selection accuracy by the LLM.

Domain Aware System Prompts

Automatically inject specialized system prompts based on query classification, ensuring optimal model behavior for different domains (math, coding, business, etc.) without manual prompt engineering.

Domain Aware Similarity Caching ⚡️

Cache the semantic representation of the prompt so as to reduce the number of prompt tokens and improve the overall inference latency.

Enterprise Security 🔒

PII detection

Detect PII in the prompt, avoiding sending PII to the LLM so as to protect the privacy of the user.

Prompt guard

Detect if the prompt is a jailbreak prompt, avoiding sending jailbreak prompts to the LLM so as to prevent the LLM from misbehaving. Can be configured globally or at the category level for fine-grained security control.

Quick Start 🚀

Get up and running in seconds with our interactive setup script:

bash ./scripts/quickstart.sh

This command will:

  • 🔍 Check all prerequisites automatically
  • 📦 Install HuggingFace CLI if needed
  • 📥 Download all required AI models (~1.5GB)
  • 🐳 Start all Docker services
  • ⏳ Wait for services to become healthy
  • 🌐 Show you all the endpoints and next steps

For detailed installation and configuration instructions, see theComplete Documentation.

Documentation 📖

For comprehensive documentation including detailed setup instructions, architecture guides, and API references, visit:

👉Complete Documentation at Read the Docs

The documentation includes:

Community 👋

For questions, feedback, or to contribute, please join#semantic-router channel in vLLM Slack.

Community Meetings 📅

We host bi-weekly community meetings to sync up with contributors across different time zones:

Join us to discuss the latest developments, share ideas, and collaborate on the project!

Citation

If you find Semantic Router helpful in your research or projects, please consider citing it:

@misc{semanticrouter2025,  title={vLLM Semantic Router},  author={vLLM Semantic Router Team},  year={2025},  howpublished={\url{https://github.com/vllm-project/semantic-router}},}

Star History 🔥

We opened the project at Aug 31, 2025. We love open source and collaboration ❤️

Star History Chart

Sponsors 👋

We are grateful to our sponsors who support us:


AMD provides us with GPU resources andROCm™ Software for training and researching the frontier router models, enhancing e2e testing, and building online models playground.

AMD

Releases

No releases published

Packages

 
 
 

Contributors49


[8]ページ先頭

©2009-2025 Movatter.jp