Back to articles

New framework tackles harder automated theorem proving by requiring AI systems to discover answers independently before formal proof construction

arXiv cs.AI · April 20, 2026

AI Summary

  • Researchers introduce 'Hard Mode' ATP benchmarks (MiniF2F-Hard and FIMO-Hard) that remove answer hints from formal statements, creating more realistic and challenging evaluation conditions
  • DAP (Discover And Prove) framework combines LLM natural-language reasoning with self-reflection to independently discover solutions, then converts problems back to solvable format for existing provers
  • DAP achieves state-of-the-art results: solves 10 problems on CombiBench (vs. 7 previously) and is the first system to solve problems on PutnamBench

Related Articles

Stay ahead with AI news

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Get Started Free