Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

LLM Router is a service that can be deployed on‑premises or in the cloud. It adds a layer between any application and the LLM provider. In real time it controls traffic, distributes a load among providers of a specific LLM, and enables analysis of outgoing requests from a security perspective (masking, anonymization, prohibited content).

License

NotificationsYou must be signed in to change notification settings

radlab-dev-group/llm-router

Repository files navigation

LLM Router is a service that can be deployed on‑premises or in the cloud.It adds a layer between any application and the LLM provider. In real time it controls traffic,distributes a load among providers of a specific LLM, and enables analysis of outgoing requestsfrom a security perspective (masking, anonymization, prohibited content).It is an open‑source solution (Apache 2.0) that can be launched instantly by runninga ready‑made image in your own infrastructure.

  • llm_router_api provides a unified REST proxy that can route requests to any supported LLM backend (OpenAI‑compatible, Ollama, vLLM, LM Studio, etc.), with built‑in load‑balancing, health checks, streaming responsesand optional Prometheus metrics.
  • llm_router_lib is a Python SDK that wraps the API with typed request/response models, automatic retries, tokenhandling and a rich exception hierarchy, letting developers focus on application logic rather than raw HTTP calls.
  • llm_router_web offers ready‑to‑use Flask UIs – an anonymizerUI that masks sensitive data and a configurationmanager for model/user settings – demonstrating how to consume the router from a browser.
  • llm_router_plugins (e.g., thefast_masker plugin)deliver a rule‑based text anonymisation engine witha comprehensive set of Polish‑specific masking rules (emails, IPs, URLs, phone numbers, PESEL, NIP, KRS, REGON,monetary amounts, dates, etc.) and an extensible architecture for custom rules and validators.
  • llm_router_services provides HTTP services thatimplement the core functionality used by the LLM‑Router’s plugin system. The services expose guardrail and maskingcapabilities through Flask applications.

All components run on Python 3.10+ usingvirtualenv and require only the listed dependencies, making the suite easy toinstall, extend, and deploy in both development and production environments.


🧩 Boilerplates

For a detailed explanation of each example’s purpose, structure, and how the boilerplates are organized, see the mainproject README:

  • Main README – Boilerplate Overviewexamples
  • LlamaIndex Boilerplate DetailsREADME

✨ Key Features

FeatureDescription
Unified REST interfaceOne endpoint schema works for OpenAI‑compatible, Ollama, vLLM and any future provider.
Provider‑agnostic streamingThestream flag (defaulttrue) controls whether the proxy forwardschunked responses as they arrive or returns asingle aggregated payload.
Built‑in prompt libraryLanguage‑aware system prompts stored underresources/prompts can be referenced automatically.
Dynamic model configurationJSON file (models-config.json) defines providers, model name, default options and per‑model overrides.
Request validationPydantic models guarantee correct payloads; errors are returned with clear messages.
Structured loggingConfigurable log level, filename, and optional JSON formatting.
Health & metadata endpoints/ping (simple 200 OK) and/tags (available model tags/metadata).
Simple deploymentOne‑liner run script orpython -m llm_proxy_rest.rest_api.
Extensible conversation formatsBasic chat, conversation with system prompt, and extended conversation with richer options (e.g., temperature, top‑k, custom system prompt).
Multi‑provider model supportEach model can be backed by multiple providers (VLLM, Ollama, OpenAI) defined inmodels-config.json.
Provider selection abstractionProviderChooser delegates to a configurable strategy, enabling easy swapping of load‑balancing, round‑robin, weighted‑random, etc.
Load‑balanced default strategyLoadBalancedStrategy distributes requests evenly across providers using in‑memory usage counters.
Dynamic model handlingModelHandler loads model definitions at runtime and resolves the appropriate provider per request.
Pluggable endpoint architectureAutomatic discovery and registration of all concreteEndpointI implementations viaEndpointAutoLoader.
Prometheus metrics integrationOptional/metrics endpoint for latency, error counts, and provider usage statistics.
Docker readyDockerfile and scripts for containerised deployment.

📦 Quick Start

1️⃣ Create & activate a virtual environment

Base requirements

python3 -m venv .venvsource .venv/bin/activate# Only the core library (llm-router-lib).pip install.# Core library + API wrapper (llm-router-api).pip install .[api]

Prometheus Metrics

To enable Prometheus metrics collection you must install the optionalmetrics dependencies:

pip install .[api,metrics]

Then start the application with the environment variable set:

export LLM_ROUTER_USE_PROMETHEUS=1

WhenLLM_ROUTER_USE_PROMETHEUS is enabled, the router automaticallyregisters a/metrics endpoint (under the API prefix, e.g./api/metrics). This endpoint exposes Prometheus‑compatible metrics suchas request counts, latencies, and any custom counters defined by theapplication. Prometheus servers can scrape this URL to collect runtimemetrics for monitoring and alerting.

2️⃣ Minimum required environment variable

./run-rest-api.sh# orLLM_ROUTER_MINIMUM=1 python3 -m llm_router_api.rest_api

🔐 Auditing

The router can record request‑level events (guard‑rail checks, payload masking, custom logs) in a tamper‑evident,encrypted form.
All audit entries are written by theauditor module and stored underlogs/auditor/ as GPG‑encrypted files.

For a complete guide—including key generation, encryption workflow, and decryption utilities—see the dedicateddocumentation:

➡️Auditing subsystem documentation


📦 Docker

Run the container with the default configuration:

docker run -p 5555:8080 quay.io/radlab/llm-router:rc1

For more advanced usage you can use a custom launch script, for example:

#!/bin/bashPWD=$(pwd)docker run \  -p 5555:8080 \  -e LLM_ROUTER_TIMEOUT=500 \  -e LLM_ROUTER_IN_DEBUG=1 \  -e LLM_ROUTER_MINIMUM=1 \  -e LLM_ROUTER_EP_PREFIX="/api" \  -e LLM_ROUTER_SERVER_TYPE=gunicorn \  -e LLM_ROUTER_SERVER_PORT=8080 \  -e LLM_ROUTER_SERVER_WORKERS_COUNT=4 \  -e LLM_ROUTER_DEFAULT_EP_LANGUAGE="pl" \  -e LLM_ROUTER_LOG_FILENAME="llm-proxy-rest.log" \  -e LLM_ROUTER_EXTERNAL_TIMEOUT=300 \  -e LLM_ROUTER_BALANCE_STRATEGY=balanced \  -e LLM_ROUTER_REDIS_HOST="192.168.100.67" \  -e LLM_ROUTER_REDIS_PORT=6379 \  -e LLM_ROUTER_MODELS_CONFIG=/srv/cfg.json \  -e LLM_ROUTER_PROMPTS_DIR="/srv/prompts" \  -v"${PWD}/resources/configs/models-config.json":/srv/cfg.json \  -v"${PWD}/resources/prompts":/srv/prompts \  quay.io/radlab/llm-router:rc1

🛠️ Configuration (via environment)

A full list of environment variables is available at the link.env list


⚖️ Load Balancing Strategies

The current list of available strategies, the interface description,and an example extension can be found at the linkload balancing strategies


🛣️ Endpoints Overview

The list of endpoints—categorized into built‑in, provider‑dependent, and extended endpoints—anda description of the streaming mechanisms can be found at the link:load endpoints overview


⚙️ Configuration Details

Config File / VariableMeaning
resources/configs/models-config.jsonJSON map of provider → model → default options (e.g.,keep_alive,options.num_ctx).
LLM_ROUTER_PROMPTS_DIRDirectory containing prompt templates (*.prompt). Sub‑folders are language‑specific (en/,pl/).
LLM_ROUTER_DEFAULT_EP_LANGUAGELanguage code used when a prompt does not explicitly specify one.
LLM_ROUTER_TIMEOUTUpper bound for any request to an upstream LLM (seconds).
LLM_ROUTER_LOG_FILENAME /LLM_ROUTER_LOG_LEVELLogging destinations and verbosity.
LLM_ROUTER_IN_DEBUGWhen set, enables DEBUG‑level logs and more verbose error payloads.

🔧 Development

  • Python3.10+ (project is tested on 3.10.6)
  • All dependencies are listed inrequirements.txt. Install them inside the virtualenv.
  • To add a new provider, create a class inllm_proxy_rest/core/api_types that implements theBaseProvider interfaceand register it inllm_proxy_rest/register/__init__.py.

📜 License

See theLICENSE file.


📚 Changelog

See theCHANGELOG for a complete history of changes.

About

LLM Router is a service that can be deployed on‑premises or in the cloud. It adds a layer between any application and the LLM provider. In real time it controls traffic, distributes a load among providers of a specific LLM, and enables analysis of outgoing requests from a security perspective (masking, anonymization, prohibited content).

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp