ext-infer: PHP 8.3+ extension for native LLM inference and embeddings via llama.cpp

Hacker NewsJun 7, 2026

Summaries like this, in your inbox every morning.

3 Key Points

ext-infer is a PHP 8.3+ extension written in Rust that loads a GGUF model and runs LLM inference inside the PHP process via llama.cpp, enabling semantic search, RAG pipelines, and worker inference without calling Python or remote APIs.
The extension provides a fluent Prompt builder, a Response that separates reasoning from answer, and an Embedding class that handles normalization and cosine similarity—designed to feel native to PHP like the intl or pdo extensions.
In-process inference reduces latency (bounded only by decode time versus milliseconds or tens of milliseconds for subprocess or HTTP calls) and eliminates the need for a separate Python sidecar, daemon, or inference server to manage alongside PHP-FPM.

AI-summarized, only the topics you pick — one digest a day via Email, Slack, or Discord.

Free · takes 30 seconds · unsubscribe anytime

No comments yet. Be the first to share your thoughts!

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Free · takes 30 seconds · unsubscribe anytime

1 minute a day. The AI essentials.

200+ sources · Email / LINE / Slack