How to Train AI Agents to Attack Large Language Model Applications

How-to-Train-AI-Agents-to-Attack-Large-Language-Model-Applications

Novee’s AI Red Teaming Solution for LLM-Powered Software

In today’s fast-paced world of enterprise software development, most companies employ AI-powered applications at a rate that exceeds traditional penetration testing capabilities. With hundreds of applications in a security team’s portfolio, thorough testing becomes impractical, given the limited time and resources available.

The Gap Between Testing Capabilities and Application Evolution

This gap allows threats to go undetected between reviews, as the underlying models, integrations, and behaviors of these applications can change without corresponding security evaluations. To address this challenge, Novee has developed AI Red Teaming for LLM Applications, an AI-driven pentesting agent designed to proactively identify vulnerabilities in LLM-powered software.

According to Gon Chalamish, co-founder and Chief Product Officer at Novee, “Attackers are already evolving their tactics to counter AI systems; we must develop similar approaches to defend our systems.”

Novee’s AI Red Teaming agent supports applications built upon various LLM providers, including OpenAI, Anthropic, and open-source models. Moreover, it can seamlessly integrate into Continuous Integration/Continuous Deployment (CI/CD) pipelines, enabling regular security tests as part of the standard development workflow.

  • The agent acquires comprehensive knowledge of the application’s operation by reading relevant documentation, querying Application Programming Interfaces (APIs), and constructing an internal model of the application’s inner workings.
  • For instance, the agent might analyze an application’s role-based access control structure and then attempt to breach data restricted to higher-privileged users through lower-privileged ones.
  • The agent tailors its probing methodology according to the unique configuration and behavior of each target application.
  • The agent can analyze and respond to complex multi-step interactions, embedding malicious instructions within an agent’s prompts or planting data in one section of an application before prompting the agent to access it.
  • The agent can also simulate intricate interactions that conventional pen testing tools cannot, such as prompt injection, indirect prompt injection, and tool abuse.

Novee’s research team has been publishing findings on genuine AI vulnerabilities, including a recent discovery in the Cursor coding assistant that allowed attackers to manipulate the tool’s context window and execute arbitrary code on a developer’s machine. The company has additional findings under responsible disclosure with other vendors.

The Need for Continuous Automated Testing

Ido Geffen, CEO and co-founder of Novee, warns that attackers are evolving faster than traditional security cycles can accommodate. The time gap between vulnerability identification and exploitation can shrink to mere minutes, underscoring the need for continuous testing, not periodic assessments.

Chalamish emphasizes that AI pentesting does not necessitate creating a new budget category for organizations. Security teams already allocate funds for pen testing, red teaming, and vulnerability scanning. The shift Novee aims to achieve is from periodic manual work to continuous automated testing, utilizing existing budgets.

Pen testing talent is scarce and expensive, and the current model of annual or biannual engagements leaves gaps that AI can fill. Novee secured $51.5 million in funding within four months of its founding, with investors including YL Ventures, Canaan Partners, and Zeev Ventures. The company was founded by Ido Geffen, Gon Chalamish, and Omer Ninburg, all hailing from backgrounds in national-level offensive security operations.




About Author

en_USEnglish