人类不是玻尔兹曼分布：应对强化学习中人类反馈与交互建模的挑战与机遇

Jun, 2022

人类不是玻尔兹曼分布：应对强化学习中人类反馈与交互建模的挑战与机遇

Humans are not Boltzmann Distributions: Challenges and Opportunities for Modelling Human Feedback and Interaction in Reinforcement Learning

HTML

PDF

David Lindner, Mennatallah El-Assady

TL;DR该论文呼吁从不同学科出发进行研究，以解决人类如何向人工智能提供反馈以及如何构建更健壮的基于人类协作的强化学习系统的关键问题，并提出人类模型必须是个性化，情境化和动态的观点。

Abstract

reinforcement learning (RL) commonly assumes access to well-specified reward functions, which many practical applications do not provide. Instead, recently, more work has explored learning what to do from interacting with humans. So far, most of these approaches model humans as being (