How Attackers Can Exploit AI Vision Models with Subtle Image Manipulations

Post Views: 98

Critical Vulnerabilities in AI Vision Models

Researchers have discovered that attackers can manipulate vision-language models (VLM) using imperceptible image changes, allowing them to evade traditional security measures.

Vulnerability allows attackers to embed malicious instructions that AI models will follow
Specially crafted visual inputs can bypass traditional security measures
Researchers found that small fonts, heavy blurring, and rotation can reduce attack success rates
Machine learning algorithms can optimize away distortions and recover image readability for AI models

Vulnerability Details

The researchers applied bounded pixel-level perturbations to images that were already failing as attacks due to poor readability or safety refusals by the target model.

Readability recovery occurs when an image becomes legible to the AI model despite being unclear to humans
Refusal reduction happens when the AI model chooses to comply with the embedded instruction despite previously refusing to do so

Testing and Results

The researchers tested their findings on several popular VLMs, including GPT-4o and Claude.

Claude showed the largest overall gain in attack success after optimization on heavily blurred images, jumping from 0% to 28%
GPT-4o’s safety filter caught most of the newly readable content, limiting the overall attack gains

According to Cisco’s AI Threat Intelligence and Security Research team, “This vulnerability highlights the importance of developing more robust defenses against AI-powered attacks.”