Which models does GitHub Copilot's harness support?

The harness supports 20+ frontier models across the GPT, Claude, Gemini, and MAI families, plus bring your own key for open-source and local models.

How does GitHub Copilot's performance compare to Claude Code and Codex CLI?

Task resolution rates are on par with model-vendor harnesses when using the same model and benchmark task, while showing lower token consumption across most configurations. On TerminalBench, GitHub Copilot's markers sit within overlapping variance ellipses with same-model competitors on both task completion and cost per task.

What unique capability does GitHub Copilot's multi-model architecture enable?

The multi-model design unlocks harness-level capabilities a model-vendor harness cannot offer—for example, Rubber Duck uses cross-model-family critique, where one model reviews another's work to improve outcomes beyond what any single model produces alone.

Back to articlesLarge Language Models

Large Language Models AI Coding Assistants

GitHub Copilot's shared AI harness matches rival model-vendor tools on task completion while using fewer tokens, letting developers mix-and-match models without sacrificing performance.

GitHub Copilot Blog4h ago5 min read

Key takeaway

GitHub released benchmark data showing its AI agent harness, which powers GitHub Copilot tools, matches the performance of competing model-vendor harnesses on software engineering tasks while using fewer tokens. The harness supports multiple AI models from different providers—GPT, Claude, Gemini, and others—giving developers flexibility to choose the best model for each job without sacrificing quality or efficiency.

Summaries like this, in your inbox every morning.

3 Key Points

What happened
GitHub published benchmark results showing its Copilot agentic harness—the shared engine powering Copilot CLI, the Copilot app, and code review—achieves task-completion rates on par with Claude Code and Codex CLI across SWE-bench Verified, SWE-bench Pro, SkillsBench, and TerminalBench, while consuming fewer tokens in most configurations.
Why it matters
The harness supports 20+ frontier models across GPT, Claude, Gemini, and MAI families, plus open-source and local models, letting developers pick the right model for each task's cost and capability needs without being locked into a single vendor's tool. Lower token use translates to reduced API costs for equivalent work.
What to watch
GitHub's multi-model architecture enables cross-model critique (e.g., Rubber Duck, where one model reviews another's output), a capability single-model vendor harnesses cannot offer. The benchmarks show GPT models deliver the best value with strong resolution at lowest cost, while Claude Opus reaches the highest resolution at higher cost.

FAQ

Which models does GitHub Copilot's harness support?: The harness supports 20+ frontier models across the GPT, Claude, Gemini, and MAI families, plus bring your own key for open-source and local models.
How does GitHub Copilot's performance compare to Claude Code and Codex CLI?: Task resolution rates are on par with model-vendor harnesses when using the same model and benchmark task, while showing lower token consumption across most configurations. On TerminalBench, GitHub Copilot's markers sit within overlapping variance ellipses with same-model competitors on both task completion and cost per task.
What unique capability does GitHub Copilot's multi-model architecture enable?: The multi-model design unlocks harness-level capabilities a model-vendor harness cannot offer—for example, Rubber Duck uses cross-model-family critique, where one model reviews another's work to improve outcomes beyond what any single model produces alone.

Discussion

No comments yet. Be the first to share your thoughts!

Unable to generate summary — the article body provided contains no news content, only a newsletter registration prompt.

Nikkei AI Stocks1h ago

OpenAI will release GPT-5.6 in limited preview with Trump administration case-by-case approval, a less restrictive arrangement than the export controls imposed on rival Anthropic.

The Verge AI4h ago

The Trump administration is pressuring OpenAI to limit early access to its new GPT 5.6 model to select partners before a broader public release, shifting from its previous hands-off AI stance.

TechCrunch AI4h ago

Visa partners with AI and stablecoin firms to build payments infrastructure for autonomous agents and digital assets, diversifying revenue beyond traditional card fees.

Top Companies AI — US (1/2)7h ago

TrueFoundry acquires MLOps pioneer Seldon AI to combine infrastructure for deploying AI agents at scale, addressing a gap where only 14% of enterprises have moved AI pilots into production.

Top Companies AI — US (1/2)7h ago

Anthropic accuses Alibaba of the largest Claude cloning attack yet, using nearly 25,000 fraudulent accounts to extract the model's capabilities over two months.

Ars Technica AI8h ago

Stay ahead with AI news

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Get Started Free

Free · takes 30 seconds · unsubscribe anytime

5 minutes a day. The AI essentials.

200+ sources · Email / LINE / Slack

Get it free →