AIToday

A developer has demonstrated the first multi-chip pipelined language model inference on ESP32-class microcontrollers, splitting a 15-million-parameter model across two boards to overcome single-chip memory limits.

Hacker News1d ago3 min read
A developer has demonstrated the first multi-chip pipelined language model inference on ESP32-class microcontrollers, splitting a 15-million-parameter model across two boards to overcome single-chip memory limits.

Summaries like this, in your inbox every morning.

Sign up free →

3 Key Points

  1. 1

    What happened: A Llama-architecture language model (AI that understands and generates text) runs with its layers split across two ESP32-S3 microcontroller boards connected by UART (a serial communication link), producing ~1.4 tokens per second (the individual words or pieces of text generated). The project is verified as the first published multi-chip pipelined inference on ESP32-class hardware, and shows a path to running a 42-million-parameter model at ~0.4–0.7 tokens per second with the same approach.

  2. 2

    Why it matters: A single ESP32-S3 with 16MB flash can only fit a ~15-million-parameter model; the next size up (~24MB) does not fit on one board. By distributing layers across two chips and streaming weights from flash memory (using 0 bytes of RAM for weights), this approach lets developers run larger models on cheap, low-power hardware without buying bigger boards. The output remains bit-exact to the monolithic model, verified against NumPy reference tests.

  3. 3

    What to watch: The roadmap targets measured hardware performance for the 42-million-parameter model, a 2–3× speed improvement via SIMD optimization in the matrix multiplication step, and an on-device touch UI to eliminate PC dependency. Code and setup instructions are publicly available under MIT License; the approach uses Karpathy's llama2.c architecture and Microsoft Research's TinyStories training dataset.

Discussion

No comments yet. Be the first to share your thoughts!

Log in to join the discussion

Related Articles

Stay ahead with AI news

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Get Started Free

Free · takes 30 seconds · unsubscribe anytime

5 minutes a day. The AI essentials.

200+ sources · Email / LINE / Slack

Get it free →