MATH-PT benchmark dataset introduces 1,729 mathematical problems in European and Brazilian Portuguese to address linguistic bias in LLM reasoning evaluations
arXiv cs.CL · April 30, 2026
AI Summary
•Researchers released MATH-PT, a dataset of 1,729 mathematical problems written in European and Brazilian Portuguese, sourced from mathematical Olympiads, competitions, and exams from Portugal and Brazil.
•Frontier reasoning models (advanced LLMs that perform complex logical inference) showed strong performance on multiple choice questions compared to open weight models, but their performance decreased for questions with figures or open-ended questions.
•The benchmark addresses a significant gap: most mathematical reasoning evaluations are exclusively in English or translated from English, introducing linguistic bias into how LLMs are assessed on math tasks.