Skip to content
Home SAST Tools Betterleaks
BE

Betterleaks

NEW
Category: SAST
License: Free (Open-Source, MIT)
Suphi Cankurt
Suphi Cankurt
AppSec Enthusiast
Updated March 19, 2026
8 min read
Key Takeaways
  • Built by Zachary Rice (zricethezav), the original creator of Gitleaks (25k+ stars), now Head of Secrets Scanning at Aikido Security.
  • Token Efficiency Filter uses BPE tokenization (cl100k_base) instead of Shannon entropy. Hits 98.6% recall on CredData vs 70.4% with entropy.
  • CEL-based secrets validation fires HTTP requests against detected credentials to check whether they are still live.
  • Drop-in replacement for Gitleaks with backwards-compatible configuration files and CLI flags.
  • Supports parallelized git scanning, recursive decoding (base64, hex, percent-encoding, unicode), and archive scanning (zip, tar).

Betterleaks is an open-source secrets scanner built by Zachary Rice, the original creator of Gitleaks (25,000+ GitHub stars). It detects and validates hardcoded credentials in git repositories, directories, and archives using BPE tokenization instead of entropy-based filtering, achieving 98.6% recall on the CredData benchmark compared to 70.4% with traditional entropy detection. Rice is currently Head of Secrets Scanning at Aikido Security.

Betterleaks CLI scan output showing detected Slack webhook and Stripe secret key findings with redacted values

The project was created on February 3, 2026 and reached v1.1.1 by March 17, 2026. It is written in Go, licensed under MIT, and designed as a drop-in replacement for Gitleaks with backwards-compatible configuration files and CLI flags.

What is Betterleaks?

Betterleaks is a free, open-source secrets detection tool that scans git repositories, directories, stdin, and compressed archives for hardcoded credentials such as API keys, tokens, and passwords. It replaces Shannon entropy with BPE (Byte Pair Encoding) tokenization using the cl100k_base model to determine whether a string is likely a real secret. On the CredData benchmark, this approach achieves 98.6% recall versus 70.4% for entropy-based scanning. Betterleaks is licensed under MIT and can be installed via Homebrew, Docker, DNF, or built from source.

How does Betterleaks improve on Gitleaks?

Betterleaks targets the same problem as Gitleaks – finding hardcoded secrets in git repositories – but improves detection accuracy, validation capability, and scanning speed.

The core difference is the detection engine. Gitleaks relies on Shannon entropy to distinguish random strings from real secrets. Betterleaks uses BPE tokenization with the cl100k_base model (the same tokenizer GPT-4 uses). On the CredData benchmark, Betterleaks hits 98.6% recall compared to Gitleaks’ 70.4% with entropy-based filtering. On large codebases, that gap means a significant number of secrets that entropy misses.

Betterleaks also adds CEL-based secrets validation. When it finds a potential credential, it can fire an HTTP request to the target service and check whether the credential is still live. A finding goes from “possible leak” to “confirmed active secret,” which changes how you prioritize remediation.

Since it is backwards-compatible with Gitleaks configuration files and CLI flags, migrating takes minimal effort. Existing .gitleaks.toml files work without modification.

Betterleaks scan time comparison benchmark showing 4.2-5.4x faster scanning than Gitleaks on Rails, Ruby, and GitLab repositories

The benchmark above (from the Betterleaks repository) compares scan times on three real-world repositories. With RE2 and 8 git workers enabled, Betterleaks scans the Rails repo in 5.8s vs Gitleaks’ 24.5s (4.2x faster), the Ruby repo in 10.3s vs 55.2s (5.4x faster), and the GitLab repo in 2m13s vs 11m28s (5.2x faster).

Token Efficiency Filter
Uses BPE tokenization (cl100k_base) instead of Shannon entropy for secret detection. Achieves 98.6% recall on CredData, compared to 70.4% with entropy-based filtering.
CEL Secrets Validation
Fires HTTP requests against detected credentials using CEL expressions to verify whether leaked secrets are still active and exploitable.
Parallelized Git Scanning
Distributes git history scanning across multiple workers via –git-workers flag, reducing scan times on large repositories.

Key Features

FeatureDetails
CLI commandsgit (scan repos), dir (scan directories), stdin (pipe input)
ConfigurationTOML format (.betterleaks.toml or .gitleaks.toml), backwards-compatible with Gitleaks
Detection engineBPE tokenization (cl100k_base) + regex rules; 98.6% recall on CredData
Secrets validationCEL expressions fire HTTP requests to verify if leaked credentials are still active
Output formatsJSON, CSV, JUnit, SARIF, custom Go templates
InstallationHomebrew, Docker, DNF (Fedora), from source
Regex enginesGo stdlib or RE2 (switchable); RE2 guarantees linear-time matching
Recursive decodingbase64, hex, percent-encoding, unicode escapes; configurable depth (default 5)
Archive supportzip, tar, and nested archives via --max-archive-depth
Git scanningParallelized via --git-workers; scans GitLab repo 5.2x faster than Gitleaks
Composite rulesMulti-part patterns with proximity matching to reduce false positives
Redaction--redact flag with configurable percentage (0-100%) for logs and stdout
Baseline support--baseline-path to ignore known findings and track only new secrets
LanguagePure Go (no CGO) — deploys anywhere without native library dependencies
LicenseMIT (no commercial restrictions)

What is the Token Efficiency Filter?

The Token Efficiency Filter is Betterleaks’ core detection innovation that replaces Shannon entropy with BPE (Byte Pair Encoding) tokenization for identifying secrets. Entropy-based detection measures the randomness of a string to decide whether it might be a secret, but many real secrets don’t have high enough entropy to pass the threshold, and many non-secrets (like UUIDs or hashes) score high entropy but aren’t credentials.

Betterleaks uses the cl100k_base tokenizer (the same tokenizer GPT-4 uses) to evaluate how efficiently a string compresses into tokens. Real secrets tokenize inefficiently because they are random, while structured strings (variable names, UUIDs, file paths) compress well.

On the CredData benchmark, the Token Efficiency Filter produces 98.6% recall versus 70.4% with Shannon entropy. In my testing, this translated to fewer missed secrets without a noticeable jump in false positives.

How does CEL-based secrets validation work?

CEL-based secrets validation is Betterleaks’ mechanism for determining whether a detected credential is still active and exploitable. Finding a secret is useful, but knowing whether it still works is what decides how fast you need to act.

Betterleaks uses CEL (Common Expression Language) expressions to define validation logic per rule. When a rule matches, the CEL expression can fire an HTTP request to the target API and check the response. If the credential returns a valid response, the finding is marked as confirmed-active rather than just a potential leak.

This is similar to what TruffleHog does with its built-in verifiers. The key difference: Betterleaks makes the validation logic user-configurable via CEL expressions, so security teams can write custom verification for internal APIs and services. TruffleHog’s verifiers are hardcoded per detector.

What are composite and multi-part rules?

Composite rules in Betterleaks combine a primary regex pattern with auxiliary patterns that must appear within a specified proximity in the source code. This approach reduces false positives for patterns that only matter near related identifiers – for example, a random-looking string is only flagged as an API key if a service name like STRIPE_KEY or aws_secret appears nearby. Betterleaks inherited this capability from Gitleaks and extended it with proximity matching configuration.

Does Betterleaks support recursive decoding?

Yes. Betterleaks recursively decodes base64, hex, percent-encoding, and unicode escape sequences before applying detection rules. The decoding depth is configurable (default 5 levels). This catches secrets that developers have obfuscated or that build tools have encoded during packaging – a common pattern in older codebases where credentials end up base64-encoded in configuration files or environment variable exports.

Does Betterleaks scan inside archives?

Betterleaks scans inside compressed archives including zip, tar, and nested archive formats via the --max-archive-depth flag. This ensures secrets hiding in vendored dependencies, bundled artifacts, or release packages don’t get missed during audits.

Can you switch regex engines in Betterleaks?

Betterleaks supports two regex engines: Go’s standard library regex engine and RE2. RE2 provides guaranteed linear-time matching, which matters when scanning large files with complex patterns. You can switch between them based on your performance and compatibility needs.

Who created Betterleaks?

Betterleaks was created by Zachary Rice (GitHub handle: zricethezav), the original author of Gitleaks, one of the most popular open-source secrets scanners with over 25,000 GitHub stars. Rice is currently Head of Secrets Scanning at Aikido Security. He started the Betterleaks project on February 3, 2026, building on the lessons learned from years of maintaining Gitleaks. The project is hosted on GitHub under the MIT license and accepts community contributions.

Use Cases

Best for
Teams already using Gitleaks that want better detection accuracy and live secrets validation without changing their workflow. Also works well for new secret scanning setups where you want verified findings from day one.

CI/CD pipeline scanning. Run Betterleaks in your CI pipeline to block pull requests that introduce secrets. The --git-workers flag keeps scan times reasonable even on large repositories. SARIF output feeds directly into GitHub Advanced Security.

Pre-commit hook. Install Betterleaks as a pre-commit hook to catch secrets before they reach version control. Same workflow as Gitleaks; existing pre-commit configurations work with minimal changes.

Incident response. When you discover a leaked credential, use CEL-based validation to check whether the secret is still active. That tells you whether rotation is urgent or can wait.

Legacy codebase audits. Recursive decoding and archive scanning help find secrets that are base64-encoded, hex-encoded, or tucked inside zip files, which is common in older codebases.

Getting Started

Betterleaks CLI help output showing available commands and key flags

1
Install. Run brew install betterleaks on macOS, or pull the Docker image with docker pull ghcr.io/betterleaks/betterleaks:latest. On Fedora, use dnf install betterleaks. You can also build from source with Go.
2
Scan a repository. Run betterleaks git /path/to/repo to scan git history for secrets. Use betterleaks dir /path/to/dir for non-git directories. Add --git-workers 4 for parallelized scanning and -v for verbose output.
3
Migrate from Gitleaks. Drop your existing .gitleaks.toml into the repository root. Betterleaks reads it natively. CLI flags are backwards-compatible, so just swap gitleaks for betterleaks in your scripts.
4
Review findings. Use --report-path results.json --report-format json to save findings. Validated secrets are marked as confirmed-active. Upload SARIF output to GitHub Advanced Security with --report-format sarif.

Strengths & Limitations

Strengths:

  • BPE tokenization measurably outperforms Shannon entropy for secret detection (98.6% vs 70.4% recall on CredData).
  • CEL-based validation is user-configurable, unlike hardcoded verification in other tools.
  • Drop-in Gitleaks replacement. No migration pain.
  • Parallelized git scanning cuts wall-clock time on large repos.
  • Recursive decoding catches encoded and obfuscated secrets.
  • MIT license, no commercial restrictions.

Limitations:

  • Very new project (created February 2026). The rule library is smaller than mature tools like Gitleaks or TruffleHog.
  • 473 GitHub stars. Small community compared to Gitleaks (25k+) or TruffleHog (25k+). Ecosystem integrations (GitHub Actions, pre-commit hooks) are still catching up.
  • No managed cloud platform. This is a CLI tool. Teams that want dashboards, team management, or hosted scanning should look at GitGuardian or TruffleHog’s commercial offering.
  • CEL validation requires writing expressions per rule. Out-of-the-box coverage for common services is still limited.

How does Betterleaks compare to other secrets scanners?

GitHub star history comparing secrets scanners — Gitleaks, TruffleHog, Kingfisher, Betterleaks, Nosey Parker, and detect-secrets over time

FeatureBetterleaksGitleaksTruffleHogGitGuardian
Detection methodBPE tokenization + regexEntropy + regex800+ detectorsPattern matching + ML
Secrets validationCEL expressions (configurable)NoBuilt-in verifiers (hardcoded)Yes (commercial)
LicenseMITMITAGPL-3.0Freemium
Scan targetsGit, directories, stdin, archivesGit, directories, stdinGit, Slack, S3, Docker, etc.Git, CI/CD (commercial)
Parallelized scanningYes (–git-workers)NoYesYes
Recursive decodingYes (base64, hex, etc.)Yes (v8.26+)LimitedYes
GitHub Stars47325,50025,100N/A

Betterleaks fits best if you care about detection accuracy and configurable validation, especially if you’re already on Gitleaks and want a painless upgrade. TruffleHog is a better pick for teams that need scanning beyond git repos (Slack, S3, Docker images). GitGuardian is the way to go for enterprises that need dashboards, team management, and hosted scanning.

Frequently Asked Questions

What is Betterleaks?
Betterleaks is a free, open-source secrets scanner created by Zachary Rice (zricethezav), the original author of Gitleaks (25k+ GitHub stars). It detects hardcoded credentials in git repositories, directories, and archives using BPE tokenization instead of Shannon entropy, achieving 98.6% recall on the CredData benchmark. Betterleaks is a drop-in replacement for Gitleaks with backwards-compatible configuration files and CLI flags, and is licensed under MIT.
How does Betterleaks compare to Gitleaks?
Betterleaks is backwards-compatible with Gitleaks configurations and CLI flags, so migration takes minimal effort. The main improvements over Gitleaks are: token efficiency filtering using BPE tokenization (98.6% recall vs 70.4% with entropy), live secrets validation via CEL expressions, parallelized git scanning with the –git-workers flag (4-5x faster on large repos), and recursive decoding for base64, hex, percent-encoding, and unicode-encoded secrets.
What is the Token Efficiency Filter in Betterleaks?
The Token Efficiency Filter is Betterleaks’ core detection innovation that replaces Shannon entropy with BPE (Byte Pair Encoding) tokenization using the cl100k_base model (the same tokenizer GPT-4 uses). Real secrets tokenize inefficiently because they are random, while structured strings like variable names and UUIDs compress well. On the CredData benchmark, this approach achieves 98.6% recall compared to 70.4% with entropy-based filtering, meaning far fewer missed secrets.
Can Betterleaks validate if leaked secrets are still active?
Yes. Betterleaks uses CEL (Common Expression Language) expressions to fire HTTP requests against detected credentials and verify whether they are still active. If a credential returns a valid response, the finding is marked as confirmed-active rather than just a potential leak. Unlike TruffleHog’s hardcoded verifiers, Betterleaks’ validation logic is fully user-configurable, so security teams can write custom verification for internal APIs and services.
Is Betterleaks free?
Yes. Betterleaks is completely free and open-source under the MIT license with no commercial restrictions. You can use it in personal projects, commercial codebases, and CI/CD pipelines without licensing fees. It can be installed via Homebrew, Docker, DNF (Fedora), or built from source.
How fast is Betterleaks compared to Gitleaks?
Betterleaks is significantly faster than Gitleaks on large repositories. With RE2 and 8 git workers enabled, benchmarks from the Betterleaks repository show 4.2x faster scanning on Rails (5.8s vs 24.5s), 5.4x faster on Ruby (10.3s vs 55.2s), and 5.2x faster on GitLab (2m13s vs 11m28s). The speed improvement comes from parallelized git scanning via the –git-workers flag and the optional RE2 regex engine.