New GeoAgentBench benchmark enables dynamic evaluation of AI agents performing complex spatial analysis tasks with 117 GIS tools
arXiv cs.AI · April 16, 2026
AI Summary
•GeoAgentBench (GABench) addresses limitations of static text-based benchmarks by providing dynamic, interactive evaluation for LLM-based Geographic Information Systems agents
•The benchmark includes 117 atomic GIS tools covering 53 spatial analysis tasks across 6 core GIS domains in a realistic execution sandbox
•Introduces Parameter Execution Accuracy (PEA) metric that measures success based on precise parameter configuration, recognizing this as the primary determinant of execution success in dynamic GIS environments
•Designed to evaluate multimodal spatial outputs and runtime feedback rather than relying solely on code or text matching