RecoverFormer: End-to-End Recovery Policy for Humanoid Robots Learns to Switch Among Multiple Stabilization Strategies
arXiv cs.RO (Robotics) · 2026年4月28日
AI要約
•Researchers present RecoverFormer, a fully end-to-end humanoid recovery policy that learns when and how to switch among recovery behaviors—including compensatory stepping, hand-environment contact, and center-of-mass reshaping—while maintaining robust performance under model mismatch.
•The architecture combines a causal transformer over a 50-step observation history with a latent recovery mode (enabling smooth transitions among distinct recovery strategies) and a contact affordance head (predicting which environmental surfaces like walls, railings, and table edges are beneficial for stabilization).
•When trained only on open floor and tested on Unitree G1 humanoid in MuJoCo, RecoverFormer transfers zero shot to walled environments, achieving 100% recovery success across 100–300 N pushes and across wall distances from 0.25–1.4m. Under zero-shot dynamics mismatch, it reaches 75.5% at plus +25% mass, 89% under 30 ms latency, 91.5% at low friction, and 99% under compound friction, latency and mass perturbation.