記事一覧に戻る

MATH-PT benchmark dataset introduces 1,729 mathematical problems in European and Brazilian Portuguese to address linguistic bias in LLM reasoning evaluations

arXiv cs.CL · 2026年4月30日

AI要約

  • Researchers released MATH-PT, a dataset of 1,729 mathematical problems written in European and Brazilian Portuguese, sourced from mathematical Olympiads, competitions, and exams from Portugal and Brazil.
  • Frontier reasoning models (advanced LLMs that perform complex logical inference) showed strong performance on multiple choice questions compared to open weight models, but their performance decreased for questions with figures or open-ended questions.
  • The benchmark addresses a significant gap: most mathematical reasoning evaluations are exclusively in English or translated from English, introducing linguistic bias into how LLMs are assessed on math tasks.

関連記事

AIニュースを毎日お届け

200以上のソースから厳選したAIニュースを毎日無料でお届けします。

無料で始める