在现实世界的序列转换任务中，基于人类反馈的离线强化学习

Nov, 2020

在现实世界的序列转换任务中，基于人类反馈的离线强化学习

Learning from Human Feedback: Challenges for Real-World Reinforcement Learning in NLP

Julia Kreutzer, Stefan Riezler, Carolin Lawrence

TL;DR文章提出了如何利用自然语言处理系统中收集的海量交互日志以优化线下强化学习的方法，同时讨论了NLP任务的性质和生产系统的限制所带来的挑战及其可能的解决方案。

Abstract

Large volumes of interaction logs can be collected from nlp systems that are deployed in the real world. How can this wealth of information be leveraged? Using such →