Reinforcement Learning (RL) plays an important role in the robotic
manipulation domain since it allows self-learning from trial-and-error
interactions with the environment. Still, sample efficiency and reward
specification seriously limit its potential. One possible solution involves
learning from expert guidance. However, obtaining a human expert is impractical
due to the high cost of supervising an RL agent, and developing an automatic
supervisor is a challenging endeavor. Large Language Models (LLMs) demonstrate
remarkable abilities to provide human-like feedback on user inputs in natural
language. Nevertheless, they are not designed to directly control low-level
robotic motions, as their pretraining is based on vast internet data rather
than specific robotics data. In this paper, we introduce the Lafite-RL
(Language agent feedback interactive Reinforcement Learning) framework, which
enables RL agents to learn robotic tasks efficiently by taking advantage of
LLMs' timely feedback. Our experiments conducted on RLBench tasks illustrate
that, with simple prompt design in natural language, the Lafite-RL agent
exhibits improved learning capabilities when guided by an LLM. It outperforms
the baseline in terms of both learning efficiency and success rate,
underscoring the efficacy of the rewards provided by an LLM.

通过利用大型语言模型的及时反馈，Lafite-RL（语言代理反馈互动式强化学习）框架使强化学习智能体能够有效地学习机器人任务，实验结果表明，Lafite-RL 智能体在自然语言的简单提示设计下，通过大型语言模型的引导在学习效率和成功率方面优于基准模型，凸显了大型语言模型所提供的奖励的功效。

利用大型语言模型的反馈加速机器人操控的强化学习

Accelerating Reinforcement Learning of Robotic Manipulations via  Feedback from Large Language Models

Forecasting influenza like illnesses (ILI) has rapidly progressed in recent
years from an art to a science with a plethora of data-driven methods. While
these methods have achieved qualified success, their applicability is limited
due to their inability to incorporate expert feedback and guidance
systematically into the forecasting framework. We propose a new approach
leveraging the Seldonian optimization framework from AI safety and demonstrate
how it can be adapted to epidemic forecasting. We study two types of guidance:
smoothness and regional consistency of errors, where we show that by its
successful incorporation, we are able to not only bound the probability of
undesirable behavior to happen, but also to reduce RMSE on test data by up to
17%.

利用 AI 安全中的 Seldonian 优化框架，我们提出了一种新的流感样疾病预测方法，其可以系统地结合专家反馈和指导，以达到更好的预测效果，并且在测试数据上能够将均方根误差减少多达 17%。

整合专家指导的疫情预测

Incorporating Expert Guidance in Epidemic Forecasting

In this paper, we study Reinforcement Learning from Demonstrations (RLfD)
that improves the exploration efficiency of Reinforcement Learning (RL) by
providing expert demonstrations. Most of existing RLfD methods require
demonstrations to be perfect and sufficient, which yet is unrealistic to meet
in practice. To work on imperfect demonstrations, we first define an imperfect
expert setting for RLfD in a formal way, and then point out that previous
methods suffer from two issues in terms of optimality and convergence,
respectively. Upon the theoretical findings we have derived, we tackle these
two issues by regarding the expert guidance as a soft constraint on regulating
the policy exploration of the agent, which eventually leads to a constrained
optimization problem. We further demonstrate that such problem is able to be
addressed efficiently by performing a local linear search on its dual form.
Considerable empirical evaluations on a comprehensive collection of benchmarks
indicate our method attains consistent improvement over other RLfD
counterparts.

本文研究了强化学习的探索效率问题。提出了一种基于专家演示的强化学习方法，通过将专家指导视为对智能体政策探索的软约束，最终转化为一个约束优化问题，并采用局部线性搜索来高效解决。在广泛的基准测试中，我们的方法比其他方法获得了更好的结果。