Researchers argue that fixed value specifications cannot ensure AI safety as systems become more capable and encounter unforeseen situations.
arXiv cs.MA (Multi-Agent) · April 17, 2026
AI Summary
•Static value alignment approaches—including reward functions, utility functions, and constitutional principles—fail to remain robust as AI capabilities scale and systems encounter new contexts
•Three philosophical obstacles make the problem fundamental: Hume's is-ought gap prevents deriving values from behavior alone, Berlin's value pluralism shows human values resist formal encoding, and the extended frame problem means any fixed value encoding will eventually misfit future scenarios
•Popular alignment methods including RLHF, Constitutional AI, inverse reinforcement learning, and cooperative assistance games all exhibit structural vulnerabilities rooted in specification traps, not merely engineering challenges solvable with better data or algorithms