Open-Source Secrets Scanner: Betterleaks for Secure Data Protection

Post Views: 10

New Open-Source Tool Enhances Secrets Scanning for Git Repositories

Secrets scanning has become a standard practice in engineering organizations, with tools like Gitleaks widely used for detecting leaked credentials, API keys, tokens, and passwords. The creator of Gitleaks, Zach Rice, has now released a new tool called Betterleaks, designed to scan git repositories, directories, and standard input for sensitive information. As the Head of Secrets Scanning at Aikido Security, Rice led the development of Betterleaks, which is built to function as a drop-in replacement for Gitleaks.

Rice Explains the Decision to Create Betterleaks

Rice explained that he no longer has full control over the Gitleaks repository and name, prompting him to start a new project.

According to Rice, Token Efficiency achieved a 98.6 percent recall rate compared to 70.4 percent for entropy-based methods, using the CredData dataset.

Features and Capabilities of Betterleaks

Betterleaks introduces a new technique called Token Efficiency, based on byte pair encoding (BPE) tokenization, to filter candidate secrets. This approach measures how a BPE tokenizer compresses a given string, with natural language compressing well into longer tokens and secrets compressing poorly into many short tokens.

Betterleaks also features validation logic written in the Common Expression Language (CEL), allowing rule authors to programmatically control what counts as a confirmed secret. The tool handles doubly and triply encoded secrets by default and supports parallelized git scanning to reduce scan times.

Betterleaks is built in pure Go without CGO, removing the dependency on Hyperscan and allowing deployment across environments without native library requirements. The tool supports scanning archives, including nested archives, and outputs results in various formats, including JSON, CSV, JUnit, SARIF, and custom templates.

Future Plans and Roadmap

The project roadmap includes several planned features, such as LLM-assisted classification, where anonymized candidate secrets are passed to a local or remote language model for additional context. Auto-revocation support is also planned for providers that expose credential revocation APIs. The team intends to add permissions mapping, which would show what access a detected secret actually carries.

Betterleaks is designed with flag-based output control, allowing AI coding agents to run it as a subprocess and consume its output without excess token overhead. The tool is available for free on GitHub, providing a valuable resource for engineering organizations to enhance their secrets scanning capabilities.