Reinforcement learning has traditionally focused on learning state-dependent
policies to solve optimal control problems in a closed-loop fashion. In this
work, we introduce the paradigm of open-loop reinforcement learning where a
fixed action sequence is learned instead. We present three new algorithms: one
robust model-based method and two sample-efficient model-free methods. Rather
than basing our algorithms on Bellman's equation from dynamic programming, our
work builds on Pontryagin's principle from the theory of open-loop optimal
control. We provide convergence guarantees and evaluate all methods empirically
on a pendulum swing-up task, as well as on two high-dimensional MuJoCo tasks,
demonstrating remarkable performance compared to existing baselines.

传统上，强化学习集中于学习状态相关策略以解决闭环最优控制问题；本文提出了开环强化学习范式，通过学习固定行动序列，引入了三种新算法：一种鲁棒的基于模型的方法和两种高效的无模型方法。基于开环最优控制理论中的庞特里亚金原理，而非动态规划中的贝尔曼方程，我们提供了收敛性保证，并在振子摆起任务以及两个高维 MuJoCo 任务上通过实证评估展示了与现有基线方法相比显着的性能。