Promptfoo is a developer-friendly command-line tool for testing, evaluating, and red teaming large language model outputs, with 10.3k GitHub stars and 900 forks.

GitHub: promptfoo/promptfoo | Latest Release: v0.120.22 (February 2026)

It enables teams to catch regressions in AI behavior, compare prompts across different models, and run automated security evaluations before deploying LLM-powered applications.

What is Promptfoo?

Promptfoo provides a structured approach to LLM testing that treats prompts as testable code artifacts.

Rather than manually checking model outputs, teams can define test cases with expected behaviors and run them automatically against any LLM provider.

The tool supports OpenAI, Anthropic, Azure, Google, local models, and custom API endpoints.

The framework emphasizes deterministic testing where possible, using assertions to verify outputs meet specific criteria.

This makes it practical to integrate LLM testing into existing CI/CD workflows and catch prompt regressions before they reach production.

Key Features

Evaluation Framework

Promptfoo supports multiple assertion types for validating LLM outputs.

You can check for exact matches, substring presence, JSON schema compliance, semantic similarity, and custom JavaScript assertions.

The evaluation engine runs prompts against test cases and generates detailed reports showing pass/fail status for each assertion.

Red Teaming Automation

The built-in red team module generates adversarial prompts to test for jailbreaks, prompt injections, and harmful content generation.

It includes attack strategies like role-playing, encoding tricks, and multi-turn manipulation attempts.

Results help identify weaknesses before malicious users discover them.

Prompt Comparison

Compare how different prompts or models perform on the same test cases.

The comparison view displays results side-by-side with metrics like response time, token usage, and assertion pass rates.

This helps optimize prompts for cost, speed, and accuracy.

Provider Abstraction

Write tests once and run them against any supported LLM provider.

Promptfoo handles API differences behind a unified interface, making it straightforward to benchmark models or migrate between providers without rewriting test suites.

Installation

Install Promptfoo globally using npm:

npm install -g promptfoo

Or run directly with npx:

npx promptfoo@latest init

For Python projects, use the Python package:

pip install promptfoo

Initialize a new project:

promptfoo init

This creates a promptfooconfig.yaml file with example configuration.

How to Use Promptfoo

Basic Configuration

Create a configuration file defining your prompts and test cases:

# promptfooconfig.yaml
prompts:
  - "You are a helpful assistant. Answer this question: {{question}}"
  - "As an expert, provide a detailed answer to: {{question}}"

providers:
  - openai:gpt-4
  - anthropic:messages:claude-3-sonnet-20240229

tests:
  - vars:
      question: "What is the capital of France?"
    assert:
      - type: contains
        value: "Paris"
  - vars:
      question: "Explain quantum computing in simple terms"
    assert:
      - type: llm-rubric
        value: "The response should be understandable by a non-technical person"

Run the evaluation:

promptfoo eval

View results in the web interface:

promptfoo view

Red Teaming

Generate adversarial test cases:

promptfoo redteam generate --purpose "Customer support chatbot"

Run the red team evaluation:

promptfoo redteam run

Integration

GitHub Actions

Add LLM testing to your CI pipeline:

name: LLM Tests
on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: '20'
      - run: npm install -g promptfoo
      - run: promptfoo eval --no-cache
        env:
          OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
      - run: promptfoo eval --output results.json
      - uses: actions/upload-artifact@v4
        with:
          name: promptfoo-results
          path: results.json

GitLab CI

llm-tests:
  image: node:20
  script:
    - npm install -g promptfoo
    - promptfoo eval --no-cache
  variables:
    OPENAI_API_KEY: $OPENAI_API_KEY

Pre-commit Hook

# .husky/pre-commit
npx promptfoo eval --no-cache || exit 1

When to Use Promptfoo

Promptfoo fits well when you need to:

Test prompt changes before deploying to production
Compare performance across different LLM providers
Run automated security evaluations on AI applications
Maintain prompt quality as models and requirements evolve
Build regression test suites for LLM-powered features

Teams building customer-facing AI products benefit most from Promptfoo’s structured testing approach.

The tool helps catch issues early when prompt changes accidentally degrade output quality or introduce security vulnerabilities.

For ad-hoc prompt experimentation, simpler tools may suffice.

Promptfoo shines when you need repeatable, automated testing as part of a mature development workflow.