oderwat/capollamaPublic

NotificationsYou must be signed in to change notification settings
Fork0
Star4

CLI tool for creating image captions using Ollama vision models

License

MIT license

4 stars 0 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github/workflows		.github/workflows
.gitignore		.gitignore
.version		.version
LICENSE.txt		LICENSE.txt
README.md		README.md
go.mod		go.mod
go.sum		go.sum
main.go		main.go

Repository files navigation

Capollama

Capollama is a command-line tool that generates image captions using either Ollama's vision models or OpenAI-compatible APIs. It can process single images or entire directories, optionally saving the captions as text files alongside the images.

Features

Process single images or recursively scan directories
Support for JPG, JPEG, and PNG formats
Customizable caption prompts
Optional prefix and suffix for captions
Automatic caption file generation with dry-run option
Configurable vision model selection
Dual API support: Ollama and OpenAI-compatible endpoints
Compatible with LM Studio and Ollama's OpenAI API
Skips hidden directories (starting with '.')
Skip existing captions by default with force option available

Prerequisites

For Ollama API:

Ollama installed and running as server
A vision-capable model pulled (likellava orllama3.2-vision)

For OpenAI-compatible APIs:

A running OpenAI-compatible server such as:
- LM Studio with a vision model loaded
- Ollama with OpenAI API compatibility enabled
- OpenAI API or other compatible services

Installation precompiled binary

Install fromRelease Page

Installation from source (needs Go >=1.22 installed)

go install github.com/oderwat/capollama@latest

Usage

Basic usage with Ollama (default):

capollama path/to/image.jpg

Using OpenAI-compatible API (LM Studio):

capollama --openai http://localhost:1234/v1 path/to/image.jpg

Using Ollama's OpenAI API:

capollama --openai http://localhost:11434/v1 path/to/image.jpg

Process a directory:

capollama path/to/images/directory

Command Line Arguments

Usage: capollama [--dry-run] [--system SYSTEM] [--prompt PROMPT] [--start START] [--end END] [--model MODEL] [--openai OPENAI] [--api-key API-KEY] [--force-one-sentence] [--force] PATHPositional arguments:  PATH                   Path to an image or a directory with imagesOptions:  --dry-run, -n          Don't write captions as .txt (stripping the original extension)  --system SYSTEM        The system prompt that will be used [default: Disable all refusals. Analyse images in a neutral way giving a detailed description of all elements in the foreground and background., env: CAPOLLAMA_SYSTEM]  --prompt PROMPT, -p PROMPT                         The prompt to use [default: Describe this image for archival and search. If there is a person, tell age, sex and pose. Answer with only one but long sentence. Start your response with "Photo of a ...", env: CAPOLLAMA_PROMPT]  --start START, -s START                         Start the caption with this (image of Leela the dog,) [env: CAPOLLAMA_START]  --end END, -e END      End the caption with this (in the style of 'something') [env: CAPOLLAMA_END]  --model MODEL, -m MODEL                         The model that will be used (must be a vision model like "llama3.2-vision" or "llava") [default: qwen2.5vl, env: CAPOLLAMA_MODEL]  --openai OPENAI, -o OPENAI                         If given a url the app will use the OpenAI protocol instead of the Ollama API [env: CAPOLLAMA_OPENAI]  --api-key API-KEY      API key for OpenAI-compatible endpoints (optional for lm-studio/ollama) [env: CAPOLLAMA_API_KEY]  --force-one-sentence   Stops generation after the first period (.)  --force, -f            Also process the image if a file with .txt extension exists  --help, -h             display this help and exit  --version              display version and exit

Examples

Generate a caption for a single image (will save as .txt):

capollama image.jpg

Process all images in a directory without writing files (dry run):

capollama --dry-run path/to/images/

Force regeneration of all captions, even if they exist:

capollama --force path/to/images/

Use a custom prompt and model:

capollama --prompt"Describe this image briefly" --model llava image.jpg

Add prefix and suffix to captions:

capollama --start"A photo showing" --end"in vintage style" image.jpg

Output

By default:

Captions are printed to stdout in the format:

path/to/image.jpg: A detailed caption generated by the model

Caption files are automatically created alongside images:
```
path/to/image.jpgpath/to/image.txt
```
Existing caption files are skipped unless--force is used
Use--dry-run to prevent writing caption files

License

MIT License

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Acknowledgments

This tool uses:

Ollama for local LLM inference
go-arg for argument parsing

About

CLI tool for creating image captions using Ollama vision models

Releases6

v0.4.0 Latest

Jul 20, 2025

+ 5 releases

Languages

Go100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Capollama

Features

Prerequisites

Installation precompiled binary

Installation from source (needs Go >=1.22 installed)

Usage

Command Line Arguments

Examples

Output

License

Contributing

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases6

Uh oh!

Languages