How to Make OpenAI Models Misbehave and Earn Rewards

How-to-Make-OpenAI-Models-Misbehave-and-Earn-Rewards

OpenAI’s Public Safety Bug Bounty Program Targets Misuse Risks Across Its Products

In a proactive effort to mitigate potential safety risks associated with its cutting-edge AI technologies, OpenAI has launched a dedicated public bug bounty program focusing on abuse and safety concerns.

This initiative aims to foster safe and secure systems by identifying and addressing vulnerabilities that could lead to harm if exploited.

The Safety Bug Bounty program supplements OpenAI’s existing Security Bug Bounty initiative, which targets traditional security vulnerabilities.

Program Scope

  • Agentic risks: These refer to situations where attackers manipulate agents, such as browser-based agents or ChatGPT agents, to perform unauthorized actions or expose sensitive user information.
  • Exposure of OpenAI proprietary information: This involves cases where model outputs inadvertently reveal internal reasoning or other confidential information.
  • Account and platform integrity risks: These pertain to weaknesses in systems that enforce rules and protect accounts, including bypassing anti-automation measures, manipulating trust signals, or evading restrictions like suspensions or bans.
According to OpenAI, “Researchers participating in the program may receive rewards for identifying issues that pose a clear risk to users and providing actionable steps to mitigate those risks.”

However, reports that merely demonstrate general content policy bypasses without safety or abuse implications are outside the program’s scope, as are issues that are easily discoverable or already well-known.


Blog Image

About Author

en_USEnglish