GitHub Codespaces Copilot Token Leak: RoguePilot Flaw Exploited

Post Views: 1

RoguePilot Vulnerability in GitHub Codespaces

A vulnerability in GitHub Codespaces, dubbed RoguePilot, has been discovered that allows attackers to seize control of repositories by injecting malicious instructions into GitHub issues.

How the Attack Works

The flaw, identified by Orca Security, enables bad actors to manipulate GitHub Copilot, an artificial intelligence-driven tool, to leak sensitive data, including the privileged GITHUB_TOKEN.

The attack begins when a malicious GitHub issue is created, which triggers the prompt injection in Copilot when an unsuspecting user launches a Codespace from that issue.

The malicious instructions are hidden within the issue’s description, using HTML comment tags to evade detection. The AI assistant then executes the instructions, allowing the attacker to steal sensitive data.

Exploiting GitHub Codespaces

The vulnerability exploits the fact that GitHub Codespaces can be launched from various entry points, including templates, repositories, commits, pull requests, or issues.

When a codespace is opened from an issue, the built-in GitHub Copilot is automatically fed the issue’s description as a prompt to generate a response.

This integration can be weaponized to manipulate Copilot into running malicious commands.

Stealthy Attack

Researchers have demonstrated that the attack can be made stealthy by hiding the prompt in the GitHub issue.

The specially crafted prompt instructs the AI assistant to leak the GITHUB_TOKEN to an external server under the attacker’s control.

By manipulating Copilot in a Codespace to check out a crafted pull request, an attacker can cause Copilot to read a file and exfiltrate the GITHUB_TOKEN to a remote server.

Patch and Related Discoveries

The RoguePilot vulnerability has been patched by Microsoft following responsible disclosure.

However, the discovery highlights the risks associated with AI-driven tools and the potential for malicious actors to exploit these systems.

Researchers have discovered that Group Relative Policy Optimization (GRPO), a reinforcement learning technique used to fine-tune large language models, can also be used to remove safety features.
This process, codenamed GRP-Obliteration, allows attackers to manipulate language models to produce malicious outputs.
Researchers have identified various side channels that can be exploited to infer the topic of a user’s conversation and fingerprint user queries with high accuracy.
A new phenomenon, dubbed Agentic ShadowLogic, has been discovered, which allows attackers to intercept requests to fetch content from a URL in real-time, routing them through infrastructure under their control.

Conclusion

The discovery of these vulnerabilities and techniques highlights the need for increased security measures to protect AI-driven systems and prevent malicious actors from exploiting these technologies.