Vulnerability Exposes Flaw in 11 AI Systems via Single Line of Code

Post Views: 281

Trend Micro Discovers Novel Jailbreak Technique Called Sockpuppeting

Researchers from Trend Micro have discovered a novel jailbreak technique called “sockpuppeting” that enables attackers to bypass safety measures in 11 prominent large language models (LLMs).

Attack Method

The attack involves exploiting a legitimate API feature called assistant prefill, which is used to force specific response formats. Attackers can inject a fake acceptance message into the assistant’s role, causing the model to respond with sensitive information.

Success Rate Varies Across Models

Gemini 2.5 Flash had a 15.7% success rate, making it the most susceptible to this attack.
GPT-40-mini had a 0.5% success rate, indicating higher resistance to the attack.

Importance of Message Validation

Organizations that use self-hosted inference servers, such as Ollama or vLLM, must manually enforce message validation to prevent this type of attack. Security teams should implement message-ordering validation at the API layer and proactively include assistant prefill attack variants in their standard AI red-teaming exercises.

According to Trend Micro, “This research highlights the importance of implementing robust security measures to protect against AI-related attacks, particularly those targeting LLMs.”

Difference in Handling Between API Providers

OpenAI and AWS Bedrock block assistant prefills entirely.
Google Vertex AI accepts assistant prefills for certain models, highlighting the need for security teams to carefully evaluate the risks associated with each provider.

Conclusion

Trend Micro’s research emphasizes the importance of implementing security measures to protect against this type of attack. By doing so, organizations can reduce the risk of sensitive information being compromised and maintain the integrity of their systems.