Researchers develop new behavioral profiling method to measure how AI agents balance task execution with safety refusals in real-world deployments
arXiv cs.AI · April 15, 2026
AI Summary
•Study introduces A-R space framework measuring Action Rate and Refusal Signal to assess LLM agent behavior at execution level rather than just task success
•Tests models across four normative regimes (Control, Gray, Dilemma, Malicious) and three autonomy configurations (direct execution, planning, reflection)
•Reveals how execution and refusal patterns shift based on contextual framing and autonomy scaffold depth, moving beyond simple aggregate safety scores