Stochastic gradient methods are among the most important algorithms in
training machine learning problems. While classical assumptions such as strong
convexity allow a simple analysis they are rarely satisfied in applications. In
recent years, global and local gradient domination properties have shown to be
a more realistic replacement of strong convexity. They were proved to hold in
diverse settings such as (simple) policy gradient methods in reinforcement
learning and training of deep neural networks with analytic activation
functions. We prove almost sure convergence rates $f(X_n)-f^*\in o\big(
n^{-\frac{1}{4\beta-1}+\epsilon}\big)$ of the last iterate for stochastic
gradient descent (with and without momentum) under global and local
$\beta$-gradient domination assumptions. The almost sure rates get arbitrarily
close to recent rates in expectation. Finally, we demonstrate how to apply our
results to the training task in both supervised and reinforcement learning.

基于全局和局部梯度支配的随机梯度下降法收敛速度证明及其在监督学习和强化学习中的应用。

随机梯度方法在梯度主导条件下的几乎必然收敛速率

Almost sure convergence rates of stochastic gradient methods under  gradient domination

Large Language Models (LLMs) have demonstrated superior results across a wide
range of tasks, while retrieval has long been established as an effective means
of obtaining task-relevant information for humans. Retrieval-augmented
Generation (RAG) are known for their effectiveness in knowledge-intensive tasks
by locating relevant information and placing it within the context window of
the LLM. However, the relationship between retrievers and LLMs is still
under-investigated. Most existing work treats the retriever and the LLM as
independent components and leaves a gap between retrieving human-friendly
information and assembling a LLM-friendly context. In this work, we examine a
novel bridge model, validate the ranking and selection assumptions in
retrievers in the context of RAG, and propose a training framework that chains
together supervised and reinforcement learning to learn a bridge model.
Empirical results demonstrate the effectiveness of our method in both
question-answering and personalized generation tasks.

我们提出了一个新的桥接模型，并验证了 RAG 中检索器的排序和选择假设，并提出了一个将监督学习和强化学习连接起来的培训框架，在问答和个性化生成任务中证明了我们方法的有效性。

弥合检索模型与语言模型之间的偏好差距

Bridging the Preference Gap between Retrievers and LLMs

A zoo of deep nets is available these days for almost any given task, and it
is increasingly unclear which net to start with when addressing a new task, or
which net to use as an initialization for fine-tuning a new model. To address
this issue, in this paper, we develop knowledge flow which moves 'knowledge'
from multiple deep nets, referred to as teachers, to a new deep net model,
called the student. The structure of the teachers and the student can differ
arbitrarily and they can be trained on entirely different tasks with different
output spaces too. Upon training with knowledge flow the student is independent
of the teachers. We demonstrate our approach on a variety of supervised and
reinforcement learning tasks, outperforming fine-tuning and other 'knowledge
exchange' methods.

通过知识流的方式将多个深度网络（教师）的知识传递给新的深度网络模型（学生），解决了在新任务中选择哪个网络或为微调新模型选择哪个网络的初始化的问题，并在监督和强化学习任务中提供比微调和其他知识交流方法更好的表现。