Federated learning has enabled multiple parties to collaboratively train
large language models without directly sharing their data (FedLLM). Following
this training paradigm, the community has put massive efforts from diverse
aspects including framework, performance, and privacy. However, an unpleasant
fact is that there are currently no realistic datasets and benchmarks for
FedLLM and previous works all rely on artificially constructed datasets,
failing to capture properties in real-world scenarios. Addressing this, we
propose FedLLM-Bench, which involves 8 training methods, 4 training datasets,
and 6 evaluation metrics, to offer a comprehensive testbed for the FedLLM
community. FedLLM-Bench encompasses three datasets (e.g., user-annotated
multilingual dataset) for federated instruction tuning and one dataset (e.g.,
user-annotated preference dataset) for federated preference alignment, whose
scale of client number ranges from 38 to 747. Our datasets incorporate several
representative diversities: language, quality, quantity, instruction, length,
embedding, and preference, capturing properties in real-world scenarios. Based
on FedLLM-Bench, we conduct experiments on all datasets to benchmark existing
FL methods and provide empirical insights (e.g., multilingual collaboration).
We believe that our FedLLM-Bench can benefit the FedLLM community by reducing
required efforts, providing a practical testbed, and promoting fair
comparisons. Code and datasets are available at
this https URL

基于 FedLLM-Bench 数据集，我们在多个数据集上进行了实验，对现有的 FL 方法进行了基准测试，在多语言协作等方面提供了实证见解。

FedLLM-Bench：面向大型语言模型的联邦学习实验基准

FedLLM-Bench: Realistic Benchmarks for Federated Learning of Large  Language Models

Offline preference optimization allows fine-tuning large models directly from
offline data, and has proved effective in recent alignment practices. We
propose generalized preference optimization (GPO), a family of offline losses
parameterized by a general class of convex functions. GPO enables a unified
view over preference optimization, encompassing existing algorithms such as
DPO, IPO and SLiC as special cases, while naturally introducing new variants.
The GPO framework also sheds light on how offline algorithms enforce
regularization, through the design of the convex function that defines the
loss. Our analysis and experiments reveal the connections and subtle
differences between the offline regularization and the KL divergence
regularization intended by the canonical RLHF formulation. In all, our results
present new algorithmic toolkits and empirical insights to alignment
practitioners.

离线偏好优化通过直接从离线数据微调大型模型，已在最近的对齐实践中证明了其有效性。我们提出了广义偏好优化（GPO），一种由一类凸函数参数化的离线损失函数家族。GPO 实现了对偏好优化的统一视角，包括现有的算法，如 DPO、IPO 和 SLiC 等特殊情况，同时自然地引入了新的变量。GPO 框架还揭示了离线算法如何通过定义损失的凸函数来实现正则化。我们的分析和实验揭示了离线正则化与规范化神经网络的 KL 散度正则化之间的联系和微妙区别。总之，我们的结果向对齐实践者呈现了新的算法工具和实证洞见。