Sakana AI introduces KAME, a tandem architecture that pairs a fast speech-to-speech model with a backend LLM running asynchronously to enable responsive yet knowledgeable conversational AI.
Hacker News · 2026年4月29日
AI要約
•Sakana AI released KAME (Tandem Architecture for Enhancing Knowledge in Real-Time Speech-to-Speech Conversational AI), with inference code, finetuning code, and model weights available on GitHub and Hugging Face. The paper was accepted at ICASSP 2026.
•KAME connects a speech-to-speech (S2S) front-end model with a backend LLM that runs in parallel. The S2S model produces immediate responses while the backend LLM asynchronously injects reasoning signals as the user's speech grows, shifting from 'think then speak' to 'speak while thinking.'
•In example comparisons with Moshi (a full-duplex S2S model), KAME demonstrated more coherent and factually grounded responses on reasoning and knowledge tasks. The system supports swapping backend LLMs—claude-opus-4-1, gpt-4.1, and gemini-2.5-flash are cited as examples, with claude-opus-4-1 tending to score higher on reasoning tasks and gpt-4.1 on humanities tasks.