Promptfoo is a developer-friendly command-line tool for testing, evaluating, and red teaming large language model outputs, with 10.3k GitHub stars and 900 forks.
GitHub: promptfoo/promptfoo | Latest Release: v0.120.22 (February 2026)
It enables teams to catch regressions in AI behavior, compare prompts across different models, and run automated security evaluations before deploying LLM-powered applications.
What is Promptfoo?
Promptfoo provides a structured approach to LLM testing that treats prompts as testable code artifacts.
Rather than manually checking model outputs, teams can define test cases with expected behaviors and run them automatically against any LLM provider.
The tool supports OpenAI, Anthropic, Azure, Google, local models, and custom API endpoints.
The framework emphasizes deterministic testing where possible, using assertions to verify outputs meet specific criteria.
This makes it practical to integrate LLM testing into existing CI/CD workflows and catch prompt regressions before they reach production.
Key Features
Evaluation Framework
Promptfoo supports multiple assertion types for validating LLM outputs.
You can check for exact matches, substring presence, JSON schema compliance, semantic similarity, and custom JavaScript assertions.
The evaluation engine runs prompts against test cases and generates detailed reports showing pass/fail status for each assertion.
Red Teaming Automation
The built-in red team module generates adversarial prompts to test for jailbreaks, prompt injections, and harmful content generation.
It includes attack strategies like role-playing, encoding tricks, and multi-turn manipulation attempts.
Results help identify weaknesses before malicious users discover them.
Prompt Comparison
Compare how different prompts or models perform on the same test cases.
The comparison view displays results side-by-side with metrics like response time, token usage, and assertion pass rates.
This helps optimize prompts for cost, speed, and accuracy.
Provider Abstraction
Write tests once and run them against any supported LLM provider.
Promptfoo handles API differences behind a unified interface, making it straightforward to benchmark models or migrate between providers without rewriting test suites.
Installation
Install Promptfoo globally using npm:
npm install -g promptfoo
Or run directly with npx:
npx promptfoo@latest init
For Python projects, use the Python package:
pip install promptfoo
Initialize a new project:
promptfoo init
This creates a promptfooconfig.yaml file with example configuration.
How to Use Promptfoo
Basic Configuration
Create a configuration file defining your prompts and test cases:
# promptfooconfig.yaml
prompts:
- "You are a helpful assistant. Answer this question: {{question}}"
- "As an expert, provide a detailed answer to: {{question}}"
providers:
- openai:gpt-4
- anthropic:messages:claude-3-sonnet-20240229
tests:
- vars:
question: "What is the capital of France?"
assert:
- type: contains
value: "Paris"
- vars:
question: "Explain quantum computing in simple terms"
assert:
- type: llm-rubric
value: "The response should be understandable by a non-technical person"
Run the evaluation:
promptfoo eval
View results in the web interface:
promptfoo view
Red Teaming
Generate adversarial test cases:
promptfoo redteam generate --purpose "Customer support chatbot"
Run the red team evaluation:
promptfoo redteam run
Integration
GitHub Actions
Add LLM testing to your CI pipeline:
name: LLM Tests
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
- run: npm install -g promptfoo
- run: promptfoo eval --no-cache
env:
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
- run: promptfoo eval --output results.json
- uses: actions/upload-artifact@v4
with:
name: promptfoo-results
path: results.json
GitLab CI
llm-tests:
image: node:20
script:
- npm install -g promptfoo
- promptfoo eval --no-cache
variables:
OPENAI_API_KEY: $OPENAI_API_KEY
Pre-commit Hook
# .husky/pre-commit
npx promptfoo eval --no-cache || exit 1
When to Use Promptfoo
Promptfoo fits well when you need to:
- Test prompt changes before deploying to production
- Compare performance across different LLM providers
- Run automated security evaluations on AI applications
- Maintain prompt quality as models and requirements evolve
- Build regression test suites for LLM-powered features
Teams building customer-facing AI products benefit most from Promptfoo’s structured testing approach.
The tool helps catch issues early when prompt changes accidentally degrade output quality or introduce security vulnerabilities.
For ad-hoc prompt experimentation, simpler tools may suffice.
Promptfoo shines when you need repeatable, automated testing as part of a mature development workflow.
