AIToday

Mistral's Leanstral 1.5 hits 100% on formal math benchmark, spots real code bugs

THE DECODER23h ago4 min read
Mistral's Leanstral 1.5 hits 100% on formal math benchmark, spots real code bugs

Key takeaway

Mistral AI released Leanstral 1.5, an open-source model trained to formally verify mathematical proofs and software correctness in the Lean 4 language. It achieves perfect scores on some math benchmarks and ranks at the top of open-source models on several formal algebra tests, and in real-world testing it identified five previously unknown bugs in actual open-source code repositories.

Summaries like this, in your inbox every morning.

Sign up free →

3 Key Points

  • What happened

    Mistral AI released Leanstral 1.5, a free open-source model licensed under Apache 2.0, designed for formal verification in Lean 4 (a programming language for verifying math proofs and software correctness). The model scores 100 percent on miniF2F, solves 587 of 672 problems on PutnamBench, and achieves 87 and 34 percent on the algebra benchmarks FATE-H and FATE-X. In practical testing, it scanned 57 open-source repositories and caught five previously unknown bugs, including an overflow bug in the Rust library varinteger.

  • Why it matters

    The model ranks at the top of the open-source field on PutnamBench, FATE-H, and FATE-X—only the closed-source Aleph Prover surpasses it on PutnamBench. For software teams and mathematicians, this suggests that open-source formal verification is reaching a level where it can detect real-world defects in production code, potentially reducing costly bugs before deployment.

  • What to watch

    Leanstral 1.5 is available now through Hugging Face and via a free API, making it immediately accessible to developers and researchers.

FAQ

Is there a cost to use Leanstral 1.5?
No. The model is free and licensed under Apache 2.0, and is available through Hugging Face and via a free API.
What is Leanstral 1.5 designed to do?
It is built for formal verification in the Lean 4 programming language, which is designed to formally verify mathematical proofs and software correctness. Although trained mainly for math, Mistral says it also performs well at code verification.

Discussion

No comments yet. Be the first to share your thoughts!

Log in to join the discussion

Related Articles

Stay ahead with AI news

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Get Started Free

Free · takes 30 seconds · unsubscribe anytime

1 minute a day. The AI essentials.

200+ sources · Email / LINE / Slack

Get it free →