github copilot cli receives second opinion feature through cross model review

Post Views: 7

GitHub Copilot CLI Gets Second-Opinion Feature

Cybersecurity researchers have developed a novel approach to detecting errors in AI-generated code, leveraging a “second-opinion” feature within the GitHub Copilot CLI that utilizes cross-model review.

This innovative technique involves having two AI models review each other’s work, highlighting potential flaws that might have been missed by either individual model.

“The idea behind Rubber Duck is to provide a second opinion on the code generated by our AI models,” said [Name], lead researcher on the project. “By using a different model family, we can catch errors that might not have been caught by the original model.”

The second-opinion feature, dubbed Rubber Duck, employs a distinct AI family than the primary Copilot session. When a developer selects a Claude model as the orchestrator in the model picker, Rubber Duck runs on GPT-5.4.

Improving Performance: Evaluations conducted on SWE-Bench Pro demonstrate that incorporating Rubber Duck into the development process significantly improves performance, achieving better results for tackling challenging multi-file and long-running tasks.
Error Detection: Rubber Duck successfully identified errors in various scenarios, including a proposed scheduler that would exit immediately, silently overwriting dictionary keys, and silently breaking a confirmation UI and cleanup paths.
Triggering Rubber Duck: Developers can trigger Rubber Duck manually or automatically, with the latter option invoking it at three specific checkpoints: after drafting a plan, after a complex implementation, and after writing tests but before running them.

Rubber Duck can be requested at any point in a session, and the system will display what changes were made and why.

The second-opinion feature is currently available in experimental mode in GitHub Copilot CLI and requires a Claude model selected in the model picker and access to GPT-5.4.

The researchers are exploring additional model family pairings for future configurations, aiming to improve the accuracy and effectiveness of the cross-model review process.