記事一覧に戻る

TraceElephant benchmark introduces full execution traces for diagnosing failures in LLM-based multi-agent systems

arXiv cs.MA (Multi-Agent) · 2026年4月27日

AI要約

  • Researchers introduced TraceElephant, a benchmark designed for failure attribution (identifying which agent and step caused a failure) in LLM-based multi-agent systems (AI systems where multiple language models work together and reason in natural language).
  • Full execution traces improve attribution accuracy by up to 76% over partial-observation counterparts; existing benchmarks omit inputs and context that developers use when debugging, whereas TraceElephant captures complete traces and reproducible environments.
  • The benchmark aims to advance failure attribution research and promote evaluation practices aligned with real-world debugging scenarios, supporting development of more transparent multi-agent systems.

関連記事

AIニュースを毎日お届け

200以上のソースから厳選したAIニュースを毎日無料でお届けします。

無料で始める