← 記事一覧に戻る

大規模言語モデル AI安全性・アラインメント

MATH-PT benchmark dataset introduces 1,729 mathematical problems in European and Brazilian Portuguese to address linguistic bias in LLM reasoning evaluations

arXiv cs.CL · 2026年4月30日

AI要約

•Researchers released MATH-PT, a dataset of 1,729 mathematical problems written in European and Brazilian Portuguese, sourced from mathematical Olympiads, competitions, and exams from Portugal and Brazil.
•Frontier reasoning models (advanced LLMs that perform complex logical inference) showed strong performance on multiple choice questions compared to open weight models, but their performance decreased for questions with figures or open-ended questions.
•The benchmark addresses a significant gap: most mathematical reasoning evaluations are exclusively in English or translated from English, introducing linguistic bias into how LLMs are assessed on math tasks.

元記事を読む

関連記事

Capgemini reports 7% first-quarter revenue growth at constant exchange rates, with generative and agentic AI now accounting for more than 10% of group bookings

大規模言語モデル

Capgemini reports 7% first-quarter revenue growth at constant exchange rates, with generative and agentic AI now accounting for more than 10% of group bookings

Yahoo Finance AI·2026年4月30日

AgentRQ, an open-source agent-human collaboration platform using Model Context Protocol, launches with real-time task management and Claude integration.

大規模言語モデル

AgentRQ, an open-source agent-human collaboration platform using Model Context Protocol, launches with real-time task management and Claude integration.

Hacker News·2026年4月30日

GitHub Copilot Student removes GPT-5.3-Codex from model picker, keeping it available through auto model selection

大規模言語モデル

GitHub Copilot Student removes GPT-5.3-Codex from model picker, keeping it available through auto model selection

Hacker News·2026年4月30日

Research paper argues that self-training in large language models leads to model collapse without external human-generated data

大規模言語モデル

Research paper argues that self-training in large language models leads to model collapse without external human-generated data

Hacker News·2026年4月30日

大規模言語モデル

OpenAI releases GPT-5.5 prompting guide emphasizing outcome-oriented instructions over process-heavy prompts

Hacker News·2026年4月30日

AIニュースを毎日お届け

200以上のソースから厳選したAIニュースを毎日無料でお届けします。

無料で始める