TL;DR通过人类反馈范式学习的大型语言模型以及人类编辑和模型生成数据结合的新技术 Sequence Alignment (un) Likelihood Training (SALT) 在医学领域自动文摘中展示了有效性。
Abstract
Recent work has shown the promise of learning with human feedback paradigms
to produce human-determined high-quality text. Existing works use human
feedback to train large language models (LLMs) in general domain