
Summaries like this, in your inbox every morning.
Sign up free →Mathematician Adam Kucharski tested Microsoft Copilot in "Auto" mode with identical datasets labeled with different countries (UK, US, France, Germany, Italy). The tool generated country-specific differences that did not exist in the data—for example, claiming Italians were three times more likely to show interest in arts careers than Brits, and Americans were 1.5 times more business-oriented than the French.
Fast models (Copilot Auto, Gemini Flash 3.5) relied on stereotypes baked into their language models rather than actually analyzing the data. ChatGPT Instant and Claude Opus 4.7 automatically switched to extended reasoning mode, wrote Python code to analyze the dataset, and correctly identified that the data was identical.
The default "Auto" mode in Copilot is the main issue, since most users stick with the default setting. Real-world risk: analyses applied to actual datasets where groups have no real differences could appear worlds apart due to the model's built-in assumptions about demographic groups.
No comments yet. Be the first to share your thoughts!
Log in to join the discussion



Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.
Get Started Free5 minutes a day. The AI essentials.
200+ sources · Email / LINE / Slack