NARE: A research prototype that routes LLM reasoning queries through a 4-layer cache and skill registry to reduce token costs and latency
Hacker News · April 28, 2026
AI Summary
•NARE pairs an LLM (Gemma-3-27B via Google Generative AI) with episodic memory and a skill registry, dispatching each query to one of four layers: an exact cache, a sandboxed Python skill (reflexive execution), delta-reasoning over a similar past episode, or a full Tree-of-Thoughts pass.
•The system compiles repeated reasoning patterns into executable Python skills during a sleep/REM consolidation loop, validated via AST (Abstract Syntax Tree) parsing, and gates skill promotion through confidence scoring and shadow verification.
•This is a research/engineering prototype without benchmarked results on standard reasoning tasks (HumanEval+, MATH, GSM8K, BIG-Bench Hard, AlfWorld, WebArena); the conceptual framings (Free-Energy, active-inference, Bayesian model reduction) are inspirations, not formal claims about the code's computation.