BriefGPT.xyz
Oct, 2024
利用弱监督进行语言模型的奖励建模
Reward Modeling with Weak Supervision for Language Models
HTML
PDF
Ben Hauptvogel, Malte Ostendorff, Georg Rehm, Sebastian Möller
TL;DR
本研究解决了奖励模型训练中对人工标注数据依赖过大的问题。通过引入弱监督的方法,利用噪声或不精确的数据标注,研究人员能够扩展RLHF数据集并提升奖励模型的性能。研究表明,虽然弱监督在小型数据集上显著提高了奖励模型的表现,但在大型数据集上效果减弱,同时利用大型语言模型生成和弱标注响应的方法也展示了扩展偏好数据的潜力。
Abstract
Recent advancements in large
Language Models
(LLMs) have led to their increased application across various tasks, with
Reinforcement Learning
from human feedback (RLHF) being a crucial part of their training to a
→