Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Burro is a command-line interface (CLI) tool built with Deno for evaluating Large Language Model (LLM) outputs. It provides a straightforward way to run different types of evaluations with secure API key management.

NotificationsYou must be signed in to change notification settings

thisguymartin/burro

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Burro is a command-line interface (CLI) tool for evaluating Large Language Model (LLM) outputs. It provides a straightforward way to run different types of evaluations with secure API key management.

🚀 Features

  • Three specialized evaluation types:
    • Answer correctness evaluation with context
    • Close-ended QA matching
    • Simple output-expected comparison
  • Secure OpenAI API key management
  • JSON-based evaluation configurations

📋 Prerequisites

  • OpenAI API key

🛠️ Installation

MacOS - Apple Silicon (M1/M2/M3)

sudo curl -L"https://github.com/thisguymartin/burro/releases/download/latest/build-mac-silicon" -o /usr/local/bin/burro&& sudo chmod +x /usr/local/bin/burro

MacOS - Intel

sudo curl -L"https://github.com/thisguymartin/burro/releases/download/latest/build-mac-intel" -o /usr/local/bin/burro&& sudo chmod +x /usr/local/bin/burro

Linux - ARM

sudo curl -L"https://github.com/thisguymartin/burro/releases/download/latest/build-linux-arm" -o /usr/local/bin/burro&& sudo chmod +x /usr/local/bin/burro

Linux - Intel

sudo curl -L"https://github.com/thisguymartin/burro/releases/download/latest/build-linux-intel" -o /usr/local/bin/burro&& sudo chmod +x /usr/local/bin/burro

Windows

  1. Downloadbuild-windows.exe from thereleases page
  2. Rename it toburro.exe
  3. Move it to your desired location (e.g.,C:\Program Files\burro\burro.exe)

🔧 Usage

Setting up API Keys

burro set-openai-key

Running Evaluations

burro run-eval<evaluation-file>

📊 Evaluation Types

✅ Current Evaluation Types

  1. Close QA (closeqa.json)

    • Exact matching for close-ended questions
    • Strict format validation
    • Support for multiple correct answers
  2. Simple Evals (evals.json)

    • Basic output vs expected comparisons
    • Quick and efficient validation
    • Flexible matching options

🔜 Coming Soon

LLM-as-a-Judge Evaluations

Advanced evaluation methods using LLMs as judges:

  • 🔜Battle: Compare outputs from different models head-to-head
  • 🔜Humor: Evaluate the humor and wit in model responses
  • 🔜Moderation: Check content for safety and appropriateness
  • 🔜Security: Assess responses for potential security vulnerabilities
  • 🔜Summarization: Evaluate the quality and accuracy of text summaries
  • 🔜SQL: Verify the correctness of generated SQL queries
  • 🔜Translation: Assess translation quality across languages
  • 🔜Fine-tuned binary classifiers: Specialized evaluations using custom-trained models

Heuristic Evaluations

Mathematical and algorithmic comparison methods:

  • 🔜Levenshtein distance: Measure string similarity using edit distance
  • 🔜Exact match: Check for perfect matches between outputs
  • 🔜Numeric difference: Compare numerical values and tolerances
  • 🔜JSON diff: Analyze structural differences in JSON outputs
  • 🔜Jaccard distance: Calculate similarity between sets of tokens

Current Evaluation Types

1. Close QA (closeqa.json)

Evaluates exact matching responses for close-ended questions.

Example format:

{"input":"List the first three prime numbers in ascending order, separated by commas.","output":"2,3,5","criteria":"Numbers must be in correct order, separated by commas with no spaces"}

2. Simple Evals (evals.json)

Compares model outputs against expected answers.

Example format:

{"input":"What is the capital of France?","output":"The capital city of France is Paris","expected":"Paris"}

🔒 Security Features

  • AES encryption for API key storage
  • Secure key generation
  • Encrypted SQLite storage

System Architecture Check

To determine which version you should download, you can check your system's architecture:

MacOS

uname -m

This will return:

  • arm64: Use Apple Silicon version (M1/M2/M3 Macs)
  • x86_64: Use Intel version

Linux

uname -m

This will return:

  • aarch64 orarm64: Use Linux ARM version
  • x86_64: Use Linux Intel version

Troubleshooting

Permission Denied

If you encounter permission issues during installation:

# Check current permissionsls -l /usr/local/bin/burro# Fix permissions if neededsudo chmod +x /usr/local/bin/burro

Command Not Found

Ifburro command is not found after installation:

  1. Verify the installation location is in your PATH
  2. Try restarting your terminal
  3. Verify the executable exists and has proper permissions

Uninstallation Guide

MacOS & Linux

sudo rm /usr/local/bin/burro# Verify removalwhich burro# Should return nothing if successfully removed

Windows

  1. Deleteburro.exe from your installation location
  2. If added to PATH:
    • Open System Properties (Win + Pause|Break)
    • Click "Advanced system settings"
    • Click "Environment Variables"
    • Under "System variables" or "User variables", find "Path"
    • Click "Edit"
    • Remove the directory containing burro.exe
    • Click "OK" to save changes

Verify removal:

where.exe burro# Should return nothing if successfully removed

About

Burro is a command-line interface (CLI) tool built with Deno for evaluating Large Language Model (LLM) outputs. It provides a straightforward way to run different types of evaluations with secure API key management.

Topics

Resources

Stars

Watchers

Forks


[8]ページ先頭

©2009-2025 Movatter.jp