AIToday

Microsoft Copilot and Google Gemini's fast models apply stereotypes instead of analyzing data, while thinking models catch the error

THE DECODERMay 24, 20262 min read
Microsoft Copilot and Google Gemini's fast models apply stereotypes instead of analyzing data, while thinking models catch the error

Summaries like this, in your inbox every morning.

Sign up free →

3 Key Points

  1. 1

    Mathematician Adam Kucharski tested Microsoft Copilot in "Auto" mode with identical datasets labeled with different countries (UK, US, France, Germany, Italy). The tool generated country-specific differences that did not exist in the data—for example, claiming Italians were three times more likely to show interest in arts careers than Brits, and Americans were 1.5 times more business-oriented than the French.

  2. 2

    Fast models (Copilot Auto, Gemini Flash 3.5) relied on stereotypes baked into their language models rather than actually analyzing the data. ChatGPT Instant and Claude Opus 4.7 automatically switched to extended reasoning mode, wrote Python code to analyze the dataset, and correctly identified that the data was identical.

  3. 3

    The default "Auto" mode in Copilot is the main issue, since most users stick with the default setting. Real-world risk: analyses applied to actual datasets where groups have no real differences could appear worlds apart due to the model's built-in assumptions about demographic groups.

Discussion

No comments yet. Be the first to share your thoughts!

Log in to join the discussion

Related Articles

Stay ahead with AI news

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Get Started Free

5 minutes a day. The AI essentials.

200+ sources · Email / LINE / Slack

Get it free →