We present LARL-RM (Large language model-generated Automaton for
Reinforcement Learning with Reward Machine) algorithm in order to encode
high-level knowledge into reinforcement learning using automaton to expedite
the reinforcement learning. Our method uses Large Language Models (LLM) to
obtain high-level domain-specific knowledge using prompt engineering instead of
providing the reinforcement learning algorithm directly with the high-level
knowledge which requires an expert to encode the automaton. We use
chain-of-thought and few-shot methods for prompt engineering and demonstrate
that our method works using these approaches. Additionally, LARL-RM allows for
fully closed-loop reinforcement learning without the need for an expert to
guide and supervise the learning since LARL-RM can use the LLM directly to
generate the required high-level knowledge for the task at hand. We also show
the theoretical guarantee of our algorithm to converge to an optimal policy. We
demonstrate that LARL-RM speeds up the convergence by 30% by implementing our
method in two case studies.

我们提出了 LARL-RM 算法，利用自动机将高层知识编码到强化学习中，以加速强化学习过程，同时使用大型语言模型通过提示工程来获取高层领域特定知识，避免了需要专家编码自动机的问题，且能够在无需专家指导和监督下进行全闭环强化学习，我们还展示了算法收敛到最优策略的理论保证，并通过两个案例研究实现了 30% 的加速收敛。