Protecting AI Autonomy: Open-Source IronCurtain Safeguard Layer for Autonomous AI Assistants
A Novel Approach to Mitigating Autonomous AI Risks
A new open-source solution, dubbed IronCurtain, is being developed to counter the risks associated with autonomous AI agents. The brainchild of veteran security engineer Niels Provos, IronCurtain aims to prevent Large Language Model (LLM)-powered agents from taking unauthorized actions, whether through prompt injection or gradual deviation from the user’s original intent.
How IronCurtain Works
Provos’ solution relies on a safeguard layer that intercepts and analyzes the agent’s intended actions before they are executed. This is achieved by routing the agent’s interactions through a separate trusted process, which acts as a policy engine. The policy engine evaluates each request against a set of guiding principles, known as a “constitution,” which is written in plain English by the user and translated into a security policy by IronCurtain.
IronCurtain Architecture
The IronCurtain architecture consists of four layers. First, the user provides an instruction to the agent, which generates TypeScript code that runs inside a V8 isolated virtual machine. The agent then issues typed function calls that map to Model Context Protocol (MCP) tool calls, which are forwarded to the trusted process. This process, a MCP proxy, acts as a policy engine and decides whether each call should be allowed, denied, or escalated to a human for approval.
Policy Engine
The policy engine’s decisions are based on a set of per-interface rules generated from the user’s constitution. These rules are created using a library of verified policy primitives and are refined iteratively to ensure they align with the user’s intent. A test scenario generator creates cases to identify gaps and contradictions, while a verifier checks that the compiled rules match the original intent.
Evaluating Tool-Call Requests
When evaluating tool-call requests, IronCurtain follows a two-phase approach. First, it checks for structural invariants, such as protected paths and sandbox-contained paths. Second, it evaluates each argument against the compiled policy rules, with the most restrictive result taking precedence.
Implementation
Once a request has been approved, it is forwarded to standard Model Context Protocol servers, which provide filesystem access, git operations, and other capabilities. The results are then returned to the agent through the trusted process, ensuring that the agent never accesses the user’s filesystem, sensitive credentials, or environment variables.
While IronCurtain is still in the early stages of development, its release as open-source software allows developers and security researchers to test and improve the approach. As autonomous AI agents become increasingly prevalent, solutions like IronCurtain will be crucial in mitigating the risks associated with their use.
