We introduce ALaRM, the first framework modeling hierarchical rewards in
reinforcement learning from human feedback (RLHF), which is designed to enhance
the alignment of large language models (LLMs) with human preferences. The
framework addresses the limitations of current alignment approaches, which
often struggle with the inconsistency and sparsity of human supervision
signals, by integrating holistic rewards with aspect-specific rewards. This
integration enables more precise and consistent guidance of language models
towards desired outcomes, particularly in complex and open text generation
tasks. By employing a methodology that filters and combines multiple rewards
based on their consistency, the framework provides a reliable mechanism for
improving model alignment. We validate our approach through applications in
long-form question answering and machine translation tasks, employing
gpt-3.5-turbo for pairwise comparisons, and demonstrate improvements over
existing baselines. Our work underscores the effectiveness of hierarchical
rewards modeling in refining LLM training processes for better human preference
alignment. We release our code at this https URL

我们介绍了 ALaRM，这是第一个模拟强化学习中的分层奖励的框架，旨在增强大型语言模型与人类偏好的一致性。该框架通过将整体奖励与特定方面的奖励相结合，解决了当前对齐方法的局限性，从而更准确、一致地引导语言模型朝着期望的结果发展，特别是在复杂和开放的文本生成任务中。通过采用基于一致性过滤和组合多个奖励的方法，该框架提供了一种可靠的机制来改善模型的对齐。我们通过在长篇问答和机器翻译任务中应用 gpt-3.5-turbo 进行成对比较，并证明了与现有基线方法相比的改进效果。我们的工作强调了分层奖励建模在改善语言模型训练过程中对人类偏好一致性的有效性。我们在此 URL 上发布了我们的代码。

ALaRM: 通过层次化奖励模型对齐语言模型

ALaRM: Align Language Models via Hierarchical Rewards Modeling

Anomalies are often indicators of malfunction or inefficiency in various
systems such as manufacturing, healthcare, finance, surveillance, to name a
few. While the literature is abundant in effective detection algorithms due to
this practical relevance, autonomous anomaly detection is rarely used in
real-world scenarios. Especially in high-stakes applications, a
human-in-the-loop is often involved in processes beyond detection such as
verification and troubleshooting. In this work, we introduce ALARM (for
Analyst-in-the-Loop Anomaly Reasoning and Management); an end-to-end framework
that supports the anomaly mining cycle comprehensively, from detection to
action. Besides unsupervised detection of emerging anomalies, it offers anomaly
explanations and an interactive GUI for human-in-the-loop processes -- visual
exploration, sense-making, and ultimately action-taking via designing new
detection rules -- that help close ``the loop'' as the new rules complement
rule-based supervised detection, typical of many deployed systems in practice.
We demonstrate \method's efficacy through a series of case studies with fraud
analysts from the financial industry.

本文提出 ALARM 框架，支持从检测到操作的全面异常挖掘，包括无监督的新兴异常检测、异常解释和交互式 GUI，以帮助人类完成闭环过程，通过设计新的检测规则来实现探索、理解和最终采取行动并演示了该方法的有效性。