Cloze testing is a common method for measuring the behavior of large language
models on a number of benchmark tasks. Using the MMLU dataset, we show that the
base-rate probability (BRP) differences across answer tokens are significant
and affect task performance ie. guess A if uncertain. We find that
counterfactual prompting does sufficiently mitigate the BRP effect. The BRP
effect is found to have a similar effect to test taking strategies employed by
humans leading to the conflation of task performance and test-taking ability.
We propose the Nvr-X-MMLU task, a variation of MMLU, which helps to
disambiguate test-taking ability from task performance and reports the latter.

使用 MMLU 数据集，通过对空测验探究基本率概率对任务性能的影响以及如何通过反事实提示来减轻这种影响。我们提出了 Nvr-X-MMLU 任务作为 MMLU 的变种，从而消除测试能力对任务性能的混淆问题。

LLM 基准性能上的基准率效应：区分考试策略与基准性能

The Base-Rate Effect on LLM Benchmark Performance: Disambiguating  Test-Taking Strategies from Benchmark Performance

Advancements in large language models (LLMs) have demonstrated remarkable
capabilities across a diverse range of applications. These models excel in
generating text completions that are contextually coherent and cover an
extensive array of subjects. However, the vast datasets required for their
training make aligning response styles during the pretraining and instruction
tuning phases challenging. Consequently, an additional alignment phase is
typically employed, wherein the model is further trained with human preference
data to better align its outputs with human expectations. While this process
doesn't introduce new capabilities per se, it does accentuate generation styles
innate to the model. This paper explores the utilization of counterfactual
prompting within the framework of Direct Preference Optimization (DPO) to align
the model's style without relying on human intervention. We demonstrate that
this method effectively instils desirable behaviour, mitigates undesirable
ones, and encourages the model to disregard inappropriate instructions. Our
findings suggest that counterfactual prompting with DPO presents a low-resource
way to fine-tune LLMs to meet the demands for responsible and ethically aligned
AI systems.

探究利用反事实提示以及直接偏好优化框架来对齐模型风格的方法，该方法有效地注入了良好的行为并减轻了不理想的情况，鼓励模型忽略不合适的指令，从而以低成本的方式使大型语言模型满足对负责任和道德对齐的人工智能系统的需求。

使用反事实数据处理器调整大型语言模型

Aligning Large Language Models with Counterfactual DPO

The past decade has witnessed dramatic gains in natural language processing
and an unprecedented scaling of large language models. These developments have
been accelerated by the advent of few-shot techniques such as chain of thought
(CoT) prompting. Specifically, CoT pushes the performance of large language
models in a few-shot setup by augmenting the prompts with intermediate steps.
Despite impressive results across various tasks, the reasons behind their
success have not been explored. This work uses counterfactual prompting to
develop a deeper understanding of CoT-based few-shot prompting mechanisms in
large language models. We first systematically identify and define the key
components of a prompt: symbols, patterns, and text. Then, we devise and
conduct an exhaustive set of experiments across four different tasks, by
querying the model with counterfactual prompts where only one of these
components is altered. Our experiments across three models (PaLM, GPT-3, and
CODEX) reveal several surprising findings and brings into question the
conventional wisdom around few-shot prompting. First, the presence of factual
patterns in a prompt is practically immaterial to the success of CoT. Second,
our results conclude that the primary role of intermediate steps may not be to
facilitate learning how to solve a task. The intermediate steps are rather a
beacon for the model to realize what symbols to replicate in the output to form
a factual answer. Further, text imbues patterns with commonsense knowledge and
meaning. Our empirical and qualitative analysis reveals that a symbiotic
relationship between text and patterns explains the success of few-shot
prompting: text helps extract commonsense from the question to help patterns,
and patterns enforce task understanding and direct text generation.

本文研究了基于 chain of thought (CoT) prompting 的 few-shot 学习机制，使用反事实提示进行实验，并通过多个模型证实了 CoT 的成功不是由 pattern 的存在而来，而是其中一个目的是在输出中找到正确的单词以形成一个正确的答案。文本有助于从问题中提取通识知识和含义，模式则强化任务理解和直接文本生成。