The alignment process changes several properties of a large language model's
(LLM's) output distribution. We analyze two aspects of post-alignment
distributional shift of LLM responses. First, we re-examine previously reported
reductions in response diversity post-alignment. Our analysis suggests that an
apparent drop in the diversity of responses is largely explained by quality
control and information aggregation. Alignment suppresses irrelevant and
unhelpful content while shifting the output distribution toward longer
responses that cover information spanning several responses from the base LLM,
essentially presenting diverse information in a single response. Finding little
evidence that alignment suppresses useful information, it is natural to ask the
opposite question: do aligned models surface information that cannot be
recovered from base models? Our second investigation shows this is not the case
and the behavior of aligned models is recoverable from base models without
fine-tuning. A combination of in-context examples and lower-resolution semantic
hints about response content can elicit responses from base LLMs that are as
similar to alignment-tuned LLM responses as alignment-tuned LLM responses are
to each other. Taken together, these results indicate that current alignment
techniques capture but do not extend the useful subset of assistant-like base
LLM behavior, providing further evidence for the Superficial Alignment
Hypothesis. They also show that in-context alignment can go surprisingly far as
a strategy for imitating aligned LLMs without fine-tuning. Our code and data is
available at this https URL

对语言模型执行对齐过程会改变其输出分布的多个属性。研究分析了语言模型响应的对齐后分布漂移的两个方面，发现对齐过程抑制了无关和无用的内容，将输出分布转向覆盖基础语言模型中多个响应的信息，从而在单个响应中提供多样化的信息。此外，研究还表明基础模型可以通过上下文示例和低分辨率的语义提示来产生与对齐模型相似的响应，进一步证明了对齐技术对基础语言模型的有用行为进行了捕捉，切实模拟了对齐后的语言模型响应，而无需进行精细调整。

从分布到 Overton 多元主义：研究大型语言模型的对齐问题

From Distributional to Overton Pluralism: Investigating Large Language  Model Alignment

Reinforcement Learning from Human Feedback (RLHF) is the prevailing approach
to ensure Large Language Models (LLMs) align with human values. However,
existing RLHF methods require a high computational cost, one main reason being
that RLHF assigns both the generation and alignment tasks to the LLM
simultaneously. In this paper, we introduce Proxy-RLHF, which decouples the
generation and alignment processes of LLMs, achieving alignment with human
values at a much lower computational cost. We start with a novel Markov
Decision Process (MDP) designed for the alignment process and employ
Reinforcement Learning (RL) to train a streamlined proxy model that oversees
the token generation of the LLM, without altering the LLM itself. Experiments
show that our method achieves a comparable level of alignment with only 1\% of
the training parameters of other methods.

我们介绍了一种代理强化学习方法（Proxy-RLHF），该方法解耦了生成和对齐大型语言模型的过程，以较低的计算成本实现与人类价值观的对齐。

Proxy-RLHF: 大规模语言模型中分离生成与对齐的代理模型

Proxy-RLHF: Decoupling Generation and Alignment in Large Language Model  with Proxy

Recently, there has been a surge in the popularity of pre trained large
language models (LLMs) (such as GPT-4), sweeping across the entire Natural
Language Processing (NLP) and Computer Vision (CV) communities. These LLMs have
demonstrated advanced multi-modal understanding capabilities and showcased
strong performance across various benchmarks. The LLM has started to embody
traits of artificial general intelligence, which holds vital guidance for
enhancing brain-like characteristics within visual encoding models. Hence, This
paper proposes a new multi-modal training paradigm, aligning with LLM, for
encoding fMRI activity in visual cortex. Based on this paradigm, we trained an
encoding model in fMRI data named the LLM-Visual Encoding Model (LLM-VEM).
Specifically, we utilize LLM (miniGPT4) to generate descriptive text for all
stimulus images, forming a high-quality textual description set. Moreover, we
use the pre-trained text encoder (CLIP) to process these detailed descriptions,
obtaining the text embedding features. Next, we use the contrast loss function
to minimize the distance between the image embedding features and the text
embedding features to complete the alignment operation of the stimulus image
and text information. With the assistance of the pre-trained LLM, this
alignment process facilitates better learning of the visual encoding model,
resulting in higher precision. The final experimental results indicate that our
training paradigm has significantly aided in enhancing the performance of the
visual encoding model.

提出了一种新的多模态训练范式，用于编码视觉皮层中的 fMRI 活动。使用预训练的 LLM 和对比损失函数完成图像和文本信息的对齐，提高了视觉编码模型的性能。

与 LLM 对齐：一种用于编码视觉皮层 fMRI 活动的新型多模态训练范式

Aligned with LLM: a new multi-modal training paradigm for encoding fMRI  activity in visual cortex

The Digital Corpus of Sanskrit records around 650,000 sentences along with
their morphological and lexical tagging. But inconsistencies in morphological
analysis, and in providing crucial information like the segmented word, urges
the need for standardization and validation of this corpus. Automating the
validation process requires efficient analyzers which also provide the missing
information. The Sanskrit Heritage Engine's Reader produces all possible
segmentations with morphological and lexical analyses. Aligning these systems
would help us in recording the linguistic differences, which can be used to
update these systems to produce standardized results and will also provide a
Gold corpus tagged with complete morphological and lexical information along
with the segmented words. Krishna et al. (2017) aligned 115,000 sentences,
considering some of the linguistic differences. As both these systems have
evolved significantly, the alignment is done again considering all the
remaining linguistic differences between these systems. This paper describes
the modified alignment process in detail and records the additional linguistic
differences observed.
Reference: Amrith Krishna, Pavankumar Satuluri, and Pawan Goyal. 2017. A
dataset for Sanskrit word segmentation. In Proceedings of the Joint SIGHUM
Workshop on Computational Linguistics for Cultural Heritage, Social Sciences,
Humanities and Literature, page 105-114. Association for Computational
Linguistics, August.

本研究描述了修正后的对齐过程，并记录了额外的语言差异，以标准化数字梵语语料库，并为其提供完整的形态和词汇信息以及分段词。