DarkMoon: Open-Source AI Penetration Testing Tool for Ethical Hacking

Post Views: 11

Penetration testing has traditionally relied on human expertise, with specialists dedicating extensive time to manually analyze networks and web applications. Manual assessments often span weeks, require high-cost expert consultations, and produce inconsistent results. Automation offers a solution to these challenges, with emerging projects leveraging AI agents to plan and execute security evaluations autonomously. DarkMoon, an open-source platform, performs end-to-end security assessments and generates evidence-supported reports upon completion.

A reasoning layer separated from execution

DarkMoon distinguishes itself by isolating the decision-making process from the tools that carry out actions. An orchestrator named OpenCode interacts with a large language model (LLM) to strategize each step, delegating actual tasks to a control layer based on the Model Context Protocol (MCP). This MCP layer restricts operations to an allow-list of approved tools, executing them within isolated Docker containers containing over 50 security utilities, including Nuclei, sqlmap, BloodHound, and NetExec.

Orchestrator and MCP Layer

Specialized sub-agents handle specific domains such as web applications, Active Directory, Kubernetes, and network protocols. Assessments follow a structured workflow. The platform identifies open ports and services, identifies the technology stack, maps the attack surface, and activates sub-agents aligned with detected components. A reactive loop continuously integrates results, enabling dynamic adjustments—for example, a WordPress site detected early may trigger a CMS agent, while a later discovery of a GraphQL endpoint could deploy a dedicated GraphQL agent.

Assessment workflow

The process adheres to established frameworks like ISO 27001, NIST SP 800-115, and MITRE ATT&CK modeling. The LLM avoids direct execution of arbitrary commands, with all actions routed through the MCP server, which enforces strict tool and workflow restrictions.

Scope defined by user input

The assessment boundary is determined at the start, based on targets, domains, IP ranges, or applications. The orchestrator constructs its analysis exclusively from assets within this authorized scope, executing only approved methodologies. New tools remain inaccessible until explicitly installed.

Cost considerations

Cost is a primary concern for users. A standard web application assessment using the Claude Opus model incurs approximately $10 in API fees, according to the project’s lead maintainer. Larger engagements, such as Active Directory or multi-host infrastructure evaluations, incur higher costs due to the model’s continuous reasoning over evolving evidence and attack paths. DarkMoon supports multiple LLMs, including OpenAI, Anthropic, OpenRouter, and local models via Ollama or llama.cpp.

OpenAI
Anthropic
OpenRouter
Local models via Ollama or llama.cpp

The Claude model is currently recommended for its balance of reasoning quality, planning stability, and long-context performance. However, model selection introduces challenges tied to vendor safety systems. Recent Anthropic models include classifiers that may interrupt, refuse, or downgrade offensive-security tasks even during authorized engagements. Testing revealed that Claude Opus 4.8 encountered limitations during assessments, while Claude Opus 4.6 completed tasks without interruptions. The project advises using Opus 4.6 as the most reliable option and highlights Anthropic’s Cyber Verification Program as an alternative for approved organizations. Lower-parameter models are incompatible with autonomous operations. The platform can operate at no cost when running locally or incur minimal expenses for advanced reasoning capabilities through frontier models. Users select their preferred trade-off between cost and functionality.

Evidence-driven findings

DarkMoon ensures findings are supported by concrete evidence. Weak signals—such as generic HTTP 200 responses, reflected payloads, or ambiguous indicators—are labeled as Unconfirmed. Confirmed findings include executed commands, raw outputs, HTTP request/response pairs, and execution traces. The LLM is never treated as the definitive source of truth; instead, evidence collected from the target environment remains the authoritative reference. This approach maintains analyst validation, minimizes manual triage, and ensures all conclusions are traceable and reproducible.