BriefGPT.xyz
Nov, 2020
在现实世界的序列转换任务中,基于人类反馈的离线强化学习
Learning from Human Feedback: Challenges for Real-World Reinforcement Learning in NLP
HTML
PDF
Julia Kreutzer, Stefan Riezler, Carolin Lawrence
TL;DR
文章提出了如何利用自然语言处理系统中收集的海量交互日志以优化线下强化学习的方法,同时讨论了NLP任务的性质和生产系统的限制所带来的挑战及其可能的解决方案。
Abstract
Large volumes of
interaction logs
can be collected from
nlp
systems that are deployed in the real world. How can this wealth of information be leveraged? Using such
→