Adapting large language models (LLMs) for specific tasks usually involves
fine-tuning through reinforcement learning with human feedback (RLHF) on
preference data. While these data often come from diverse labelers' groups
(e.g., different demographics, ethnicities, company teams, etc.), traditional
RLHF approaches adopt a "one-size-fits-all" approach, i.e., they
indiscriminately assume and optimize a single preference model, thus not being
robust to unique characteristics and needs of the various groups. To address
this limitation, we propose a novel Group Robust Preference Optimization (GRPO)
method to align LLMs to individual groups' preferences robustly. Our approach
builds upon reward-free direct preference optimization methods, but unlike
previous approaches, it seeks a robust policy which maximizes the worst-case
group performance. To achieve this, GRPO adaptively and sequentially weights
the importance of different groups, prioritizing groups with worse cumulative
loss. We theoretically study the feasibility of GRPO and analyze its
convergence for the log-linear policy class. By fine-tuning LLMs with GRPO
using diverse group-based global opinion data, we significantly improved
performance for the worst-performing groups, reduced loss imbalances across
groups, and improved probability accuracies compared to non-robust baselines.

利用新方法 Group Robust Preference Optimization (GRPO) 对大型语言模型进行重调优，通过考虑不同群体的特点和需求，显著提高了最差表现的群体的性能，减少了群体间的损失不平衡，提高了概率准确性。

无奖励强化学习中的群组偏好优化

Group Robust Preference Optimization in Reward-free RLHF

With ever-increasing available data, predicting individuals' preferences and
helping them locate the most relevant information has become a pressing need.
Understanding and predicting preferences is also important from a fundamental
point of view, as part of what has been called a "new" computational social
science. Here, we propose a novel approach based on stochastic block models,
which have been developed by sociologists as plausible models of complex
networks of social interactions. Our model is in the spirit of predicting
individuals' preferences based on the preferences of others but, rather than
fitting a particular model, we rely on a Bayesian approach that samples over
the ensemble of all possible models. We show that our approach is considerably
more accurate than leading recommender algorithms, with major relative
improvements between 38% and 99% over industry-level algorithms. Besides, our
approach sheds light on decision-making processes by identifying groups of
individuals that have consistently similar preferences, and enabling the
analysis of the characteristics of those groups.

本研究提出了基于随机块模型和贝叶斯方法的新型方法，用于预测个人偏好并确定个体的相关群组，相对于已有的工业级算法，该方法有 38% 至 99% 的相对提升。