Interactive reinforcement learning has shown promise in learning complex
robotic tasks. However, the process can be human-intensive due to the
requirement of large amount of interactive feedback. This paper presents a new
method that uses scores provided by humans, instead of pairwise preferences, to
improve the feedback efficiency of interactive reinforcement learning. Our key
insight is that scores can yield significantly more data than pairwise
preferences. Specifically, we require a teacher to interactively score the full
trajectories of an agent to train a behavioral policy in a sparse reward
environment. To avoid unstable scores given by human negatively impact the
training process, we propose an adaptive learning scheme. This enables the
learning paradigm to be insensitive to imperfect or unreliable scores. We
extensively evaluate our method on robotic locomotion and manipulation tasks.
The results show that the proposed method can efficiently learn near-optimal
policies by adaptive learning from scores, while requiring less feedback
compared to pairwise preference learning methods. The source codes are publicly
available at this https URL

本文提出了一种新的方法，使用由人提供的分数代替成对偏好，在交互式强化学习中提高反馈效率，该方法在机器人运动和操作任务中得到广泛评估，结果表明，该方法可以通过自适应学习从分数中高效学习接近最优策略，而无需像成对偏好学习方法那样需要更少的反馈。

通过自适应评分学习提高交互式强化学习的反馈效率

Boosting Feedback Efficiency of Interactive Reinforcement Learning by  Adaptive Learning from Scores

Despite the promising results achieved, state-of-the-art interactive
reinforcement learning schemes rely on passively receiving supervision signals
from advisor experts, in the form of either continuous monitoring or
pre-defined rules, which inevitably result in a cumbersome and expensive
learning process. In this paper, we introduce a novel initiative
advisor-in-the-loop actor-critic framework, termed as Ask-AC, that replaces the
unilateral advisor-guidance mechanism with a bidirectional learner-initiative
one, and thereby enables a customized and efficacious message exchange between
learner and advisor. At the heart of Ask-AC are two complementary components,
namely action requester and adaptive state selector, that can be readily
incorporated into various discrete actor-critic architectures. The former
component allows the agent to initiatively seek advisor intervention in the
presence of uncertain states, while the latter identifies the unstable states
potentially missed by the former especially when environment changes, and then
learns to promote the ask action on such states. Experimental results on both
stationary and non-stationary environments and across different actor-critic
backbones demonstrate that the proposed framework significantly improves the
learning efficiency of the agent, and achieves the performances on par with
those obtained by continuous advisor monitoring.

本研究提出一种新的框架 Ask-AC，通过引入两个组成部分 Action Requester 和 Adaptive State Selector，实现了交互式强化学习中学习者主动请求咨询，与咨询者之间双向信息交流，提高了学习效率并获得了与持续监测模式相似的表现。

Ask-AC: 一种基于循环者评论者框架的主动式建议系统

Ask-AC: An Initiative Advisor-in-the-Loop Actor-Critic Framework

Interactive reinforcement learning, where humans actively assist during an
agent's learning process, has the promise to alleviate the sample complexity
challenges of practical algorithms. However, the inner workings and state of
the robot are typically hidden from the teacher when humans provide feedback.
To create a common ground between the human and the learning robot, in this
paper, we propose an Augmented Reality (AR) system that reveals the hidden
state of the learning to the human users. This paper describes our system's
design and implementation and concludes with a discussion on two directions for
future work which we are pursuing: 1) use of our system in AI education
activities at the K-12 level; and 2) development of a framework for an AR-based
human-in-the-loop reinforcement learning, where the human teacher can see
sensory and cognitive representations of the robot overlaid in the real world.

本文介绍了一个增强现实系统，使人类可以观察到机器人学习的隐藏状态，建立了人类和机器人的共同基础，并讨论了使用我们的系统在 K-12 教育活动中以及开发基于 AR 的人类循环强化学习框架的两个未来方向。