Skip to content
LLM Guard

LLM Guard

Category: AI Security
License: Free (Open-Source)
Suphi Cankurt
Suphi Cankurt
+8 Years in AppSec
Updated May 22, 2026
9 min read
Key Takeaways
  • Open-source LLM security toolkit with 15 input scanners and 20 output scanners
  • Detects prompt injection, PII leaks, toxic outputs, and data leakage
  • MIT licensed, 2.5k GitHub stars, deployable as a standalone API server
  • Works with any LLM provider โ€” not locked to a specific vendor

LLM Guard is an open-source AI security toolkit by Protect AI that scans LLM inputs and outputs for security and compliance risks. It has 2.5k GitHub stars and 342 forks on GitHub.

LLM Guard architecture: application sends prompts through Input Controls (anonymization, prompt injection, PII, toxicity) to the LLM, and responses pass through Output Controls before returning to the app

The library is MIT licensed and requires Python 3.10+. Protect AI, the company behind LLM Guard, also develops Guardian and ModelScan for ML supply chain security. The latest release is v0.3.16.

I use LLM Guard when I need to sit a policy layer in front of a chatbot before it talks to a user. The input scanners catch prompt injection, PII, and banned topics. The output scanners catch refusals, toxic output, and sensitive leakage. It is the most complete free guardrail library I have tried, and it runs offline without calling back to a vendor API.

Quick Pick

  • Self-hosted, offline, free? โ†’ LLM Guard (this page)
  • Managed cloud API with SLA? โ†’ Lakera Guard
  • Dialog flow control with Colang DSL? โ†’ NeMo Guardrails
  • Red-team / adversarial testing (not runtime)? โ†’ Garak or PyRIT

What is LLM Guard?

LLM Guard sits between your application and its language model. It runs 15 input scanners on user prompts before they reach the model, and 20 output scanners on the model’s responses before they reach the user.

Each scanner handles a specific risk: prompt injection, PII exposure, toxic language, secrets in code, and more. Scanners are modular.

You pick which ones you need and configure them independently.

The library works with any language model since it processes text, not model internals. It also ships with an API server mode for language-agnostic deployments.

What does LLM Guard protect against?

LLM Guard covers six runtime security risks: prompt injection attacks that hijack model behavior, PII leakage in outputs, toxic and harmful content generation, hardcoded secrets in responses, malicious URLs in model output, and factual inconsistency when responses contradict provided context.

The toolkit runs 15 input scanners on every prompt before it reaches the LLM and 20 output scanners on every response before it returns to the user. Each scanner is independent โ€” activate only the checks your use case needs, with no performance cost from unused scanners.

The PromptInjection scanner uses ML models trained on real injection patterns rather than regex matching, so it catches indirect injection attempts hidden in documents or tool outputs. All processing runs locally โ€” no prompt data, no response data, and no credentials leave your infrastructure. LLM Guard is MIT licensed and works with any LLM provider.

Input Scanning
15 scanners that filter user prompts before they reach your LLM. Covers prompt injection, PII anonymization, secrets detection, toxicity, banned topics, invisible text, and more.
Output Scanning
20 scanners that validate model responses. Checks for bias, malicious URLs, factual consistency, sensitive data leaks, toxicity, and relevance to the original query.
API Server
Deploy LLM Guard as a standalone HTTP API. Integrates with any language or framework, not just Python. Available via Docker for production deployments.

LLM Guard Key Features

FeatureDetails
Input Scanners15 scanners: Anonymize, BanCode, BanCompetitors, BanSubstrings, BanTopics, Code, Gibberish, InvisibleText, Language, PromptInjection, Regex, Secrets, Sentiment, TokenLimit, Toxicity
Output Scanners20 scanners: BanCompetitors, BanSubstrings, BanTopics, Bias, Code, Deanonymize, JSON, Language, LanguageSame, MaliciousURLs, NoRefusal, ReadingTime, FactualConsistency, Gibberish, Regex, Relevance, Sensitive, Sentiment, Toxicity, URLReachability
PII HandlingAnonymize scanner replaces PII in prompts; Deanonymize restores it in outputs when appropriate
Prompt InjectionDedicated scanner using ML models to detect direct and indirect injection attempts
Secrets DetectionIdentifies API keys, passwords, and credentials in both inputs and outputs
Factual ConsistencyOutput scanner that checks whether responses are consistent with provided context
LicenseMIT โ€” fully open-source
Python SupportRequires Python >=3.10, <3.13

LLM Guard Input Scanners

LLM Guard ships 15 input scanners that process user prompts before they reach the LLM:

  • Anonymize โ€” replaces PII (names, emails, phone numbers, credit card numbers) with placeholders (addresses OWASP LLM02: Sensitive Information Disclosure )
  • BanCode โ€” blocks prompts containing code snippets
  • BanCompetitors โ€” filters mentions of specified competitor names
  • BanSubstrings โ€” blocks prompts containing specific text patterns
  • BanTopics โ€” prevents prompts about restricted subjects
  • Code โ€” detects code content in prompts
  • Gibberish โ€” identifies nonsensical or garbled input
  • InvisibleText โ€” detects hidden Unicode characters used in prompt injection
  • Language โ€” enforces language restrictions on input
  • PromptInjection โ€” detects direct and indirect injection attacks using ML models (addresses OWASP LLM01: Prompt Injection )
  • Regex โ€” pattern-based filtering with custom regular expressions
  • Secrets โ€” identifies API keys, passwords, and credentials
  • Sentiment โ€” analyzes emotional tone of input
  • TokenLimit โ€” enforces maximum token count
  • Toxicity โ€” filters harmful, offensive, or abusive language

LLM Guard Output Scanners

LLM Guard’s 20 output scanners validate and filter model responses:

  • FactualConsistency โ€” checks whether the response is consistent with the provided context
  • Bias โ€” detects biased or discriminatory content in responses
  • Deanonymize โ€” restores PII that was anonymized in the input stage
  • JSON โ€” validates JSON structure and schema compliance
  • MaliciousURLs โ€” blocks links to known malicious sites
  • NoRefusal โ€” flags when the model refuses to answer without good reason
  • Relevance โ€” checks whether the response matches the original query
  • URLReachability โ€” verifies that URLs in responses actually resolve
  • ReadingTime โ€” estimates the reading time of responses
  • LanguageSame โ€” verifies the response is in the same language as the input
Playground available
Protect AI hosts an interactive playground on Hugging Face Spaces where you can test LLM Guard scanners without installing anything. Visit the LLM Guard Playground to try it out.

Getting Started with LLM Guard

1
Install the library โ€” Run pip install llm-guard. Requires Python 3.10 or higher. For GPU-accelerated inference, install with pip install llm-guard[onnxruntime-gpu].
2
Choose your scanners โ€” Import the input and output scanners you need. Each scanner is independent, so you only load what you use.
3
Scan inputs and outputs โ€” Use scan_prompt() to process user input through your chosen input scanners, send the sanitized prompt to your LLM, then use scan_output() to validate the response.
4
Deploy as API (optional) โ€” For non-Python environments, deploy LLM Guard as a standalone API server. The API server wraps all scanner functionality behind HTTP endpoints.

A minimal end-to-end scan looks like this:

from llm_guard import scan_prompt, scan_output
from llm_guard.input_scanners import Anonymize, PromptInjection, Toxicity
from llm_guard.output_scanners import Deanonymize, Sensitive
from llm_guard.vault import Vault

vault = Vault()
input_scanners = [Anonymize(vault), Toxicity(), PromptInjection()]
output_scanners = [Deanonymize(vault), Sensitive()]

prompt = "Hi, my name is John Doe and my credit card is 4242-4242-4242-4242."
sanitized_prompt, results_valid, results_score = scan_prompt(input_scanners, prompt)

if any(not result for result in results_valid.values()):
    raise ValueError(f"Prompt blocked: {results_valid}")

response = your_llm_call(sanitized_prompt)  # OpenAI, Anthropic, local, anything

sanitized_response, results_valid, results_score = scan_output(
    output_scanners, sanitized_prompt, response
)

Anonymize replaces the PII with placeholders before the prompt hits the LLM. Deanonymize restores the original values in the response when it is safe to do so. The Vault keeps the mapping between placeholders and real values.

LLM Guard vs other LLM security tools

LLM GuardNeMo GuardrailsLakera GuardGarak
PurposeRuntime input/output scanningRuntime dialog + content railsRuntime scanning (cloud API)Offline red-team / adversarial testing
DeploymentSelf-hosted (pip, Docker)Self-hosted (pip, Docker)Managed cloud APICLI, offline
LicenseMIT (free)Apache-2.0 (free)Commercial (free tier)Apache-2.0 (free)
Prompt injection scannerYes (ML-based)Yes (via rail config)Yes (proprietary model)N/A (red-team, not runtime)
PII / anonymizationYes (15 input scanners)Via Colang custom railsYesNo
Dialog flow controlNoYes (Colang DSL)NoNo
Offline / air-gappedYesYesNo (cloud API)Yes
Any LLM providerYes (text-only)YesYesYes
Best forSelf-hosted policy layerComplex dialog rulesManaged SaaS guardrailsSecurity testing before launch

Release note โ€” v0.3.16: The current LLM Guard release (v0.3.16) ships 15 input scanners and 20 output scanners, with expanded PromptInjection model updates and improved performance on Python 3.12. Check the GitHub releases page for the latest version. Protect AI ships multiple minor releases per quarter.

LLM Guard Python API: importing PromptInjection, Toxicity, and Anonymize scanners, running scan_prompt(), and seeing scanner results with INVALID score 0.94 blocking the input

LLM Guard vs Lakera Guard

LLM Guard and Lakera Guard solve the same problem from opposite deployment models. LLM Guard is a self-hosted Python library you run locally with no per-call cost. Lakera Guard is a managed cloud API.

Lakera leads on operational simplicity. You hit an HTTPS endpoint and get a guardrail verdict back. The trade-off is data egress โ€” every prompt leaves your network for evaluation.

Pick LLM Guard when data residency, offline operation, or zero per-request cost matters. Pick Lakera when you want guardrails as a service and prefer paying for managed inference over running your own.

LLM Guard vs NeMo Guardrails

LLM Guard and NeMo Guardrails operate at different layers of the LLM stack. LLM Guard runs scanner functions on prompts and responses โ€” it answers “is this input safe?” or “does this output leak PII?”. NeMo Guardrails runs a dialog flow engine using Colang DSL.

NeMo is structurally heavier. You define conversation rails in a custom language, and the runtime decides whether the model should answer, refuse, or hand off to a tool. LLM Guard is structurally lighter โ€” you import scanners and call them in your existing application code.

Pick LLM Guard when you want surgical input/output filtering you can drop into a Python app. Pick NeMo Guardrails when you need conversational policy enforcement across multi-turn dialogs and have the budget to maintain Colang flows.

LLM Guard vs Garak

LLM Guard and Garak are not direct replacements โ€” they sit at opposite ends of the LLM security workflow. Garak is a red-teaming probe that fires adversarial prompts at a model offline and reports which jailbreaks succeed. LLM Guard is a runtime guardrail that scans live prompts and responses in production.

Garak’s output is a report card on a model’s weaknesses. You run it once per model release, fix what you can, and accept the rest. LLM Guard’s output is a per-request verdict that blocks or sanitizes traffic as it flows.

Most teams run both. Use Garak in CI to vet new models or prompts before deployment; use LLM Guard at runtime to enforce the guardrails Garak identified as necessary. Pick only one if you must โ€” Garak alone leaves production unprotected, LLM Guard alone leaves model selection blind.

When to use LLM Guard

LLM Guard fits teams that want open-source, self-hosted guardrails for LLM applications. Since it runs locally, your data never leaves your infrastructure.

The modular scanner design means you can start with just prompt injection detection and add PII anonymization or toxicity filtering later. Each scanner works independently, so adding one doesn’t affect the others.

The library works with any LLM provider because it scans the text, not the model. Whether you use OpenAI, Anthropic, local models, or a mix, the same scanners apply.

Best for
Teams that need self-hosted, open-source input/output scanning for LLM applications with fine-grained control over which security checks to apply.

For a broader look at AI and LLM security, read the AI security guide . For a different approach to LLM safety that includes dialog flow control, look at NeMo Guardrails .

For red teaming and adversarial testing rather than runtime protection, consider Garak or PyRIT . Lakera Guard offers similar scanner functionality as a managed cloud API.

Frequently Asked Questions

What is LLM Guard?
LLM Guard is an open-source security toolkit by Protect AI that provides input and output scanners for LLM applications. It has 2.5k GitHub stars, offers 15 input scanners and 20 output scanners, and is MIT licensed.
Is LLM Guard free to use?
Yes, LLM Guard is free and open-source under the MIT license. Install it via pip (requires Python 3.10+) and deploy it as a standalone API server or integrate it directly into your Python application.
Does LLM Guard protect against prompt injection?
Yes, LLM Guard includes a dedicated PromptInjection scanner that detects direct and indirect injection attacks. Additional input scanners handle jailbreak detection, invisible text detection, and content filtering.
What LLM vulnerabilities does LLM Guard address?
LLM Guard covers prompt injection, PII anonymization, toxicity filtering, secrets detection, malicious URL blocking, bias detection, factual consistency checking, and data leakage prevention through its modular scanner architecture.
What is the difference between LLM Guard and Lakera Guard?
LLM Guard is a self-hosted MIT-licensed Python library; Lakera Guard is a managed cloud API. LLM Guard has no per-request cost and keeps prompts on your infrastructure. Lakera Guard removes operational overhead but sends prompts to a vendor cloud. Choose LLM Guard for data residency and zero cost, Lakera for managed-service simplicity.
Is LLM Guard production-ready for high-traffic LLM applications?
LLM Guard runs scanners synchronously in Python, so latency scales with the number of scanners you enable. For high-traffic APIs, deploy LLM Guard as the standalone API server, run multiple workers behind a load balancer, and select only the scanners you need rather than all 35. Benchmark with your production scanner config before launch.