BriefGPT.xyz
Apr, 2024
从对抗性反馈中的上下文对决强盗问题的近乎最优算法
Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback
HTML
PDF
Qiwei Di, Jiafan He, Quanquan Gu
TL;DR
通过创新性对抗反馈的鲁棒情境对决算法,本研究在学习人类反馈中探索大型语言模型的对齐方法,并证明了在存在或不存在创新性对抗反馈的情况下,算法具有接近最优的后悔界限。同时,在各种类型的创新性对抗反馈中,实验结果表明该算法优于现有的对决算法。
Abstract
learning from human feedback
plays an important role in aligning
generative models
, such as
large language models
(LLM). However, the effe
→