Microsoft Copilot and Google Gemini's fast models apply stereotypes instead of analyzing data, while thinking models catch the error

THE DECODERMay 24, 2026

Summaries like this, in your inbox every morning.

3 Key Points

Mathematician Adam Kucharski tested Microsoft Copilot in "Auto" mode with identical datasets labeled with different countries (UK, US, France, Germany, Italy). The tool generated country-specific differences that did not exist in the data—for example, claiming Italians were three times more likely to show interest in arts careers than Brits, and Americans were 1.5 times more business-oriented than the French.
Fast models (Copilot Auto, Gemini Flash 3.5) relied on stereotypes baked into their language models rather than actually analyzing the data. ChatGPT Instant and Claude Opus 4.7 automatically switched to extended reasoning mode, wrote Python code to analyze the dataset, and correctly identified that the data was identical.
The default "Auto" mode in Copilot is the main issue, since most users stick with the default setting. Real-world risk: analyses applied to actual datasets where groups have no real differences could appear worlds apart due to the model's built-in assumptions about demographic groups.

AI-summarized, only the topics you pick — one digest a day via Email, Slack, or Discord.

Free · takes 30 seconds · unsubscribe anytime

No comments yet. Be the first to share your thoughts!

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Free · takes 30 seconds · unsubscribe anytime

1 minute a day. The AI essentials.

200+ sources · Email / LINE / Slack