Praxen: Open-Source AI Agent Behavior Verification

Post Views: 11

Praxen is an open-source solution designed to validate that AI agents execute tasks as specified, ensuring alignment between declared objectives and observed actions.

Introduction to Praxen

Praxen is an open-source solution designed to validate that AI agents execute tasks as specified. The system evaluates an agent’s stated policy against its actual operations, identifying discrepancies between declared objectives and observed actions. It serves as the foundational implementation of Agent Behavior Verification, a framework that assigns each agent a defined role and ensures adherence to established controls. This approach mirrors organizational practices for managing employee permissions, applying similar principles to software agents by restricting their operational scope.

Key Features and Verification Process

The verification process begins with the creation of a Worker Remit, a structured markdown document outlining an agent’s responsibilities, authorized tools, communication channels, and prohibited activities. Praxen analyzes evidence such as source code, deployment configurations, behavioral logs, and governance documents to compare this policy against the agent’s real-world behavior. Results are delivered through a self-contained HTML report, a machine-readable JSON file, and a text summary stored locally. All data remains on-premises, with the tool integrated as a Claude Code plugin.

Analysis Checks and Classifications

Each analysis executes a series of checks targeting policy-implementation gaps, credential exposure, configuration flaws, capability drift, supply-chain vulnerabilities, incomplete controls, security-relevant empty files, secondary prompt discovery, and compound signal reasoning that links individual findings into broader attack scenarios. Findings are tagged with classifications from the OWASP Top 10 for LLM Applications 2025, OWASP Top 10 for Agentic AI Applications 2026, OWASP Secure MCP Server Development Guide 2026, and the RAISE Framework, which evaluates maturity across six categories.

Deployment and Policy Governance

Praxen is deployed before deployment and with every release. It requires a coding agent tested against Claude Code and Python 3.9 or later. A unified policy governs the agent lifecycle, with runtime monitoring handled through Exabeam’s Agent Behavior Analytics (ABA). Steve Wilson, Exabeam’s Chief AI Officer, emphasized the goal of a single policy document that defines an agent’s role, permissions, and constraints. This document serves as the basis for both verification and runtime analysis, enabling ABA to detect deviations from expected behavior.

Integration and Future Plans

While verification and analytics operate as separate layers, Wilson noted plans to integrate them into a broader Behavior Intelligence strategy for AI agents. Consistency in results is maintained through a frozen regression suite that validates major findings and maturity assessments across releases. Smaller variations in severity counts or scoring are deemed acceptable, with all findings traceable to source materials for independent verification.

Data Handling and Scalability

Handling large evidence sets involves prioritizing relevant data from source code, configuration files, dependency manifests, tool definitions, memory artifacts, and logs. Large logs are sampled to expand coverage, while long-running analyses checkpoint progress to prevent data loss from context window limitations. Findings derived from sampled evidence include markers indicating missing data.

Conclusion

Praxen is freely available on GitHub, offering a tool for organizations to audit AI agent behavior. The solution addresses critical security and compliance needs, ensuring alignment between intended and actual agent operations.

“This document serves as the basis for both verification and runtime analysis, enabling ABA to detect deviations from expected behavior.” – Steve Wilson, Exabeam’s Chief AI Officer