We describe an approach for aligning an LLM-based dialogue agent based on
global (i.e., dialogue-level) rewards, while also taking into account
naturally-occurring multimodal signals. At a high level, our approach (dubbed
GELI) learns a local, turn-level reward model by decomposing the human-provided
Global Explicit (GE) session-level reward, using Local Implicit (LI} multimodal
reward signals to crossmodally shape the reward decomposition step. This
decomposed reward model is then used as part of the standard RHLF pipeline
improve an LLM-based dialog agent. We run quantitative and qualitative human
studies to evaluate the performance of our GELI approach, and find that it
shows consistent improvements across various conversational metrics compared to
baseline methods.

本研究通过全局（即对话级别）奖励来对齐基于 LLM 的对话代理，并考虑自然出现的多模态信号。该方法学习了一个本地的、以轮为单位的奖励模型，通过分解人工提供的全局显式（GE）会话级奖励，使用本地隐式（LI）多模态奖励信号来跨模态地塑造奖励分解步骤。这个分解的奖励模型然后作为标准 RHLF 流程的一部分，以提高基于 LLM 的对话代理的性能。我们进行了定量和定性的人类研究来评估我们的 GELI 方法的性能，并发现相比基准方法，在各种对话指标上都显示出了一致的改进。