Recent advances of locomotion controllers utilizing deep reinforcement learning (RL) have yielded impressive results in terms of achieving rapid and robust locomotion across challenging terrain, such as rugged rocks, non-rigid ground, and slippery surfaces. However, while these controllers primarily address challenges underneath the robot, relatively little research has investigated legged mobility through confined 3D spaces, such as narrow tunnels or irregular voids, which impose all-around constraints. The cyclic gait patterns resulted from existing RL-based methods to learn parameterized locomotion skills characterized by motion parameters, such as velocity and body height, may not be adequate to navigate robots through challenging confined 3D spaces, requiring both agile 3D obstacle avoidance and robust legged locomotion. Instead, we propose to learn locomotion skills end-to-end from goal-oriented navigation in confined 3D spaces. To address the inefficiency of tracking distant navigation goals, we introduce a hierarchical locomotion controller that combines a classical planner tasked with planning waypoints to reach a faraway global goal location, and an RL-based policy trained to follow these waypoints by generating low-level motion commands. This approach allows the policy to explore its own locomotion skills within the entire solution space and facilitates smooth transitions between local goals, enabling long-term navigation towards distant goals. In simulation, our hierarchical approach succeeds at navigating through demanding confined 3D environments, outperforming both pure end-to-end learning approaches and parameterized locomotion skills. We further demonstrate the successful real-world deployment of our simulation-trained controller on a real robot.

使用深度強化學習的運動控制器在克服具挑戰性的地形（如崎嶇的岩石、不規則的地面和滑溜的表面）上取得了令人印象深刻的快速和穩健的運動方面的最近突破。但是，相對較少的研究投入到透過狹窄隧道或不規則空洞等局限的3D空間中的腿部移動性，這些地方會強加整體限制。因此，我們提議從目標導向的過程中學習在局限的3D空間中的運動技能。通過將傳統計劃師負責規劃到達遠處全球目標位置的路徑點與透過生成低層運動指令來跟隨這些路徑點的基於RL的策略結合，我們引入一種層次化的運動控制器來解決跟踪遠處導航目標的低效問題。在模擬中，我們的層次化方法成功地在具有挑戰性的局限的3D環境中導航，優於純粹的端到端學習方法和參數化的運動技能。我們還展示了在真實機器人上成功部署我們在模擬中訓練的控制器。

在受限的三维空间中运用强化学习实现灵巧的腿部步行动力学