Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Quick Start

Get Olla up and running with this quick start guide.

Prerequisites

Configuration Examples

Olla merges your YAML file on top of built-in defaults, so you only need to specify what you want to override. The shippedconfig/config.yaml shows all available options for reference.

Basic Setup

1. Create Configuration

Create aconfig.yaml for your setup.

Configuration Best Practice

Create aconfig/config.local.yaml containing only the settings you need to change. Built-in defaults cover everything else. This file takes priority overconfig.yaml and won't be committed to version control.

$cpconfig/config.yamlconfig/config.local.yaml$viconfig/config.local.yaml# keep only the settings you need to override

See theconfiguration overview for merge behaviour details.

Here's a minimal configuration example, showing the most common changes users make:

server:host:"0.0.0.0"port:40114request_logging:trueproxy:engine:"olla"# or "sherpa" for small instancesload_balancer:"priority"discovery:type:"static"static:endpoints:-url:"http://localhost:11434"name:"local-ollama"type:"ollama"priority:100logging:level:"info"format:"json"

Settings likecheck_interval,check_timeout, andpriority are optional -- Olla provides sensible defaults for each backend type via its profile system.

The rest will be from the shipped defaults.

2. Start Olla

Start Olla with your configuration:

# Uses config/config.local.yaml automatically (if present)olla# Or specify a custom configolla--configmy-awesome-config.yaml

On startup, you'll see which configuration was loaded:

{"level":"INFO","msg":"Initialising","version":"v0.x.x","pid":123456}{"level":"INFO","msg":"System Configuration","isContainerised":false,...}{"level":"INFO","msg":"Loaded configuration","config":"config/config.local.yaml"}{"level":"INFO","msg":"Initialising stats collector"}...

3. Test the Proxy

Check that Olla is running:

curlhttp://localhost:40114/internal/health

List available models through the proxy:

# For Ollama endpointscurlhttp://localhost:40114/olla/ollama/api/tags# For OpenAI-compatible endpointscurlhttp://localhost:40114/olla/ollama/v1/models

Example Requests

Chat Completion (OpenAI-compatible)

curl-XPOSThttp://localhost:40114/olla/ollama/v1/chat/completions\-H"Content-Type: application/json"\-d'{    "model": "llama3.2",    "messages": [      {"role": "user", "content": "Hello, how are you?"}    ]  }'

Ollama Generate

curl-XPOSThttp://localhost:40114/olla/ollama/api/generate\-H"Content-Type: application/json"\-d'{    "model": "llama3.2",    "prompt": "Why is the sky blue?"  }'

Streaming Response

curl-XPOSThttp://localhost:40114/olla/ollama/v1/chat/completions\-H"Content-Type: application/json"\-d'{    "model": "llama3.2",    "messages": [      {"role": "user", "content": "Tell me a story"}    ],    "stream": true  }'

llama.cpp Endpoint

curl-XPOSThttp://localhost:40114/olla/llamacpp/v1/chat/completions\-H"Content-Type: application/json"\-d'{    "model": "llama-3.2-3b-instruct-q4_k_m.gguf",    "messages": [{"role": "user", "content": "Hello!"}]  }'

Multiple Endpoints Configuration

Configure multiple LLM endpoints with load balancing:

discovery:type:"static"static:endpoints:# High priority local Ollama-url:"http://localhost:11434"name:"local-ollama"type:"ollama"priority:100# Medium priority LM Studio-url:"http://localhost:1234"name:"local-lm-studio"type:"lm-studio"priority:50# llama.cpp endpoint-url:"http://localhost:8080"name:"local-llamacpp"type:"llamacpp"priority:95# Low priority remote endpoint-url:"https://api.example.com"name:"remote-api"type:"openai"priority:10

Monitoring

Monitor Olla's performance:

# Health statuscurlhttp://localhost:40114/internal/health# System status and statisticscurlhttp://localhost:40114/internal/status

Response headers provide request tracing:

curl-Ihttp://localhost:40114/olla/ollama/v1/models

Look for these headers:

  • X-Olla-Endpoint: Which backend handled the request
  • X-Olla-Backend-Type: Type of backend (ollama/openai/lm-studio)
  • X-Olla-Request-ID: Unique request identifier
  • X-Olla-Response-Time: Total processing time

Common Configuration Options

High-Performance Setup

For production environments, use the Olla engine:

proxy:engine:"olla"# High-performance engineload_balancer:"least-connections"connection_timeout:30s# Note: Automatic retry on connection failures is built-in

Rate Limiting

Protect your endpoints with rate limiting:

server:rate_limits:global_requests_per_minute:1000per_ip_requests_per_minute:100burst_size:50

Request Size Limits

Set appropriate request limits:

server:request_limits:max_body_size:52428800# 50MBmax_header_size:524288# 512KB

Learn More

Core Concepts

Configuration

Next Steps

Troubleshooting

Endpoint Not Responding

Check your endpoint URLs and ensure the services are running:

# Test direct access to your LLM endpointcurlhttp://localhost:11434/api/tags

Health Checks Failing

Verify health check URLs are correct for your endpoint type:

  • Ollama: Use/ or/api/version
  • LM Studio: Use/ or/v1/models
  • OpenAI-compatible: Use/v1/models

High Latency

Consider switching to the high-performance Olla engine:

proxy:engine:"olla"load_balancer:"least-connections"

For more detailed troubleshooting, check the logs andopen an issue if needed.


[8]ページ先頭

©2009-2026 Movatter.jp