Frontier AI Models Vulnerable to Multi-Turn Attacks Revealed by Cisco

Post Views: 192

Attackers Refuse to Give Up: Uncovering Multi-Turn AI Attacks on Large Language Models

The notion that large language models can withstand even the most sophisticated cyber threats is being challenged by recent research from Cisco’s AI threat intelligence team.

A Significant Gap Between Publicly Stated Safety Benchmarks and Actual Resilience

An exhaustive analysis of 15 prominent models from OpenAI, Anthropic, Google, Amazon, and xAI reveals a disturbing truth: these models crumble under multi-turn attacks, exposing a significant gap between their publicly stated safety benchmarks and actual resilience.

Single-Turn vs. Multi-Turn: A Tale of Two Worlds

The researchers conducted extensive testing, covering over 30,000 single-turn prompts and nearly 7,000 multi-turn attacks spread across more than 1,400 conversations.

Results are stark: single-turn and multi-turn evaluation produce drastically different rankings, failure maps, and tail-risk profiles.
While some models excel in single-turn scenarios, they falter when faced with iterative attacks.

Amy Chang, Head of AI Threat and Security Research at Cisco, emphasizes the importance of assessing a model’s resilience against real-world attack scenarios.”How secure is this model against real-world attack scenarios?” she asks.”That translates to: How does this model hold up against multi-turn, adaptive attacks?”

Strategy Families Drive Multi-Turn Outcomes

The researchers identified five key strategy families driving most of the multi-turn outcomes:

Role-play and persona adoption
Contextual ambiguity
Refusal reframing
Information decomposition
Crescendo-style escalation

These tactics allow attackers to adapt and refine their approach, exploiting weaknesses in the models’ defenses.

Guardrails Reduce Risk but Fail to Eliminate It

Production deployments of large language models typically incorporate additional safety layers, such as guardrails.

While these measures help mitigate risks, they cannot eliminate them entirely.
Amy Chang emphasizes that the base model sets the floor on what any production system can achieve, making it essential to consider the model’s inherent vulnerabilities when evaluating its overall security.

Operational Steps for Organizations Buying or Deploying AI Models

Organizations buying or deploying AI models should prioritize the following operational steps:

Publish ASR by strategy family on every model release.
Gate deployments on regressions in the top three procedures and content types using a 3-point threshold.
Flag any model with a cross-regime gap above 15 points for manual review.