prompt-testing
Here are 23 public repositories matching this topic...
Language:All
Sort:Most stars
Test your prompts, agents, and RAGs. AI Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.
- Updated
Dec 18, 2025 - TypeScript
Agentic LLM Vulnerability Scanner / AI red teaming kit 🧪
- Updated
Nov 30, 2025 - Python
LLM Reasoning and Generation Benchmark. Evaluate LLMs in complex scenarios systematically.
- Updated
May 25, 2025 - TypeScript
Prompture is an API-first library for requesting structured JSON output from LLMs (or any structure), validating it against a schema, and running comparative tests between models.
- Updated
Nov 22, 2025 - Python
Test, compare, and optimize your AI prompts in minutes
- Updated
Aug 13, 2025 - JavaScript
The prompt engineering, prompt management, and prompt evaluation tool for TypeScript, JavaScript, and NodeJS.
- Updated
Nov 15, 2025 - TypeScript
EvalWise is a developer-friendly platform for LLM evaluation and red teaming that helps test AI models for safety, compliance, and performance issues
- Updated
Nov 20, 2025 - Python
LLM Prompt Test helps you test Large Language Models (LLMs) prompts to ensure they consistently meet your expectations.
- Updated
May 22, 2024 - TypeScript
Community Plugin for Genkit to use Promptfoo
- Updated
Jan 3, 2025 - TypeScript
prompt-evaluator is an open-source toolkit for evaluating, testing, and comparing LLM prompts. It provides a GUI-driven workflow for running prompt tests, tracking token usage, visualizing results, and ensuring reliability across models like OpenAI, Claude, and Gemini.
- Updated
Dec 4, 2025 - TypeScript
An open-source AI prompt engineering playground with live code execution. Test OpenAI & Claude prompts, execute JavaScript, and iterate in real-time.
- Updated
Nov 8, 2025 - TypeScript
Sample project demonstrates how to use Promptfoo, a test framework for evaluating the output of generative AI models
- Updated
Sep 10, 2024
A pytest-based framework for testing multi AI agents systems. It provides a flexible and extensible platform for complex multi-agent simulations. Supports many integrations like LiteLLM, CrewAI, LangChain etc.
- Updated
Sep 24, 2025 - TypeScript
A collection of prompts that I use on a day-to-day basis for work and leisure.
- Updated
Sep 9, 2024
Run 1,000 LLM evaluations in 10 minutes. Test prompts across Claude, GPT-4, and Gemini with parallel execution, real-time cost tracking, and beautiful visualizations. Open source.
- Updated
Dec 12, 2025 - Python
Visual prompt engineering platform for creating, testing, and versioning LLM prompts across multiple providers (OpenAI, Anthropic, Mistral, Gemini).
- Updated
Nov 5, 2025 - TypeScript
Quickstart guide for using PromptFoo to evaluate LLM prompts via CLI or Colab.
- Updated
Nov 23, 2025
AI RAG evaluation project using Ragas. Includes RAG metrics (precision, recall, faithfulness), retrieval diagnostics, and prompt testing examples for fintech/banking LLM systems. Designed as an AI QA Specialist portfolio project.
- Updated
Nov 17, 2025 - Python
Sample implementation demonstrating how to use Firebase Genkit with Promptfoo
- Updated
Sep 11, 2024 - TypeScript
Improve this page
Add a description, image, and links to theprompt-testing topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with theprompt-testing topic, visit your repo's landing page and select "manage topics."