某些事物比其他事物更令人尴尬：采用配对尴尬损失进行偏好优化

Dec, 2023

某些事物比其他事物更令人尴尬：采用配对尴尬损失进行偏好优化

Some things are more CRINGE than others: Preference Optimization with the Pairwise Cringe Loss

Jing Xu, Andrew Lee, Sainbayar Sukhbaatar, Jason Weston

TL;DR使用现有的二元反馈方法Cringe Loss，通过简单的软边界扩展实现了面向配对偏好的训练，其在AlpacaFarm基准测试上优于PPO和DPO等最先进的偏好优化算法。

Abstract

Practitioners commonly align large language models using pairwise preferences, i.e., given labels of the type response A is preferred to response B for a given input. Perhaps less commonly, methods have also been