This paper develops a novel rating-based reinforcement learning approach that
uses human ratings to obtain human guidance in reinforcement learning.
Different from the existing preference-based and ranking-based reinforcement
learning paradigms, based on human relative preferences over sample pairs, the
proposed rating-based reinforcement learning approach is based on human
evaluation of individual trajectories without relative comparisons between
sample pairs. The rating-based reinforcement learning approach builds on a new
prediction model for human ratings and a novel multi-class loss function. We
conduct several experimental studies based on synthetic ratings and real human
ratings to evaluate the effectiveness and benefits of the new rating-based
reinforcement learning approach.

本文提出了一种新的基于评分的强化学习方法，利用人类评分来获取强化学习中的人类指导。该方法与现有的基于偏好和基于排名的强化学习范例不同，通过对样本轨迹的人类评估而非样本对的相对比较，基于人类评分构建了新的预测模型和新的多类损失函数。我们通过基于合成评分和真实人类评分的几个实验研究来评估新的基于评分的强化学习方法的有效性和益处。