New AI framework Chain of Modality fixes multimodal model paradox where single-input systems outperform multi-sensory ones
arXiv cs.CV · April 17, 2026
AI Summary
•Omni-MLLMs that integrate multiple sensory inputs underperform compared to unimodal baselines, revealing a critical flaw in current multimodal AI systems
•Problem identified: static fusion topologies cause positional bias in sequential inputs and alignment traps in interleaved formats that distort attention processing
•Chain of Modality (CoM) framework proposed as solution, dynamically switching between parallel, sequential, and interleaved input pathways to eliminate structural biases
•CoM employs task-aligned cognitive execution with dual pathways for more flexible and context-aware multimodal processing