Frontier AI Models Vulnerable to Multi-Turn Attacks Revealed by Cisco
Attackers Refuse to Give Up: Uncovering Multi-Turn AI Attacks on Large Language Models
The notion that large language models can withstand even the most sophisticated cyber threats is being challenged by recent research from Cisco’s AI threat intelligence team.
A Significant Gap Between Publicly Stated Safety Benchmarks and Actual Resilience
An exhaustive analysis of 15 prominent models from OpenAI, Anthropic, Google, Amazon, and xAI reveals a disturbing truth: these models crumble under multi-turn attacks, exposing a significant gap between their publicly stated safety benchmarks and actual resilience.
Single-Turn vs. Multi-Turn: A Tale of Two Worlds
The researchers conducted extensive testing, covering over 30,000 single-turn prompts and nearly 7,000 multi-turn attacks spread across more than 1,400 conversations.
- Results are stark: single-turn and multi-turn evaluation produce drastically different rankings, failure maps, and tail-risk profiles.
- While some models excel in single-turn scenarios, they falter when faced with iterative attacks.
Strategy Families Drive Multi-Turn Outcomes
The researchers identified five key strategy families driving most of the multi-turn outcomes:
- Role-play and persona adoption
- Contextual ambiguity
- Refusal reframing
- Information decomposition
- Crescendo-style escalation
These tactics allow attackers to adapt and refine their approach, exploiting weaknesses in the models’ defenses.
Guardrails Reduce Risk but Fail to Eliminate It
Production deployments of large language models typically incorporate additional safety layers, such as guardrails.
- While these measures help mitigate risks, they cannot eliminate them entirely.
- Amy Chang emphasizes that the base model sets the floor on what any production system can achieve, making it essential to consider the model’s inherent vulnerabilities when evaluating its overall security.
Operational Steps for Organizations Buying or Deploying AI Models
Organizations buying or deploying AI models should prioritize the following operational steps:
- Publish ASR by strategy family on every model release.
- Gate deployments on regressions in the top three procedures and content types using a 3-point threshold.
- Flag any model with a cross-regime gap above 15 points for manual review.