Large Language Models (LLMs) have demonstrated impressive zero-shot
capabilities and versatility in NLP tasks, however they sometimes fail to
maintain crucial invariances for specific tasks. One example is permutation
sensitivity, where LLMs' outputs may significantly vary depending on the order
of the input options. While debiasing techniques can mitigate these issues, and
yield better performance and reliability, they often come with a high
computational cost at inference. This paper addresses this inefficiency at
inference time. The aim is to distill the capabilities of a computationally
intensive, debiased, teacher model into a more compact student model. We
explore two variants of student models: one based on pure distillation, and the
other on an error-correction approach for more complex tasks, where the student
corrects a single biased decision from the teacher to achieve a debiased
output. Our approach is general and can be applied to both black-box and
white-box LLMs. Furthermore, we demonstrate that our compact, encoder-only
student models can outperform their larger, biased teacher counterparts,
achieving better results with significantly fewer parameters.

本论文研究了使用蒸馏技术将计算密集的、被消除偏见的教师模型的功能提炼到更紧凑的学生模型中，通过两种学生模型的探索，一种基于纯蒸馏的模型，另一种基于纠错方法用于更复杂的任务，学生模型纠正教师模型的单个有偏决策以达到无偏结果，并证明较小、仅编码器的学生模型在参数数量显著较少的情况下能够胜过较大、有偏的教师模型，取得更好的结果。

教师 - 学生训练用于去偏：大型语言模型的一般排列去偏

Teacher-Student Training for Debiasing: General Permutation Debiasing  for Large Language Models

Large language and vision-language models are rapidly being deployed in
practice thanks to their impressive capabilities in instruction following,
in-context learning, and so on. This raises an urgent need to carefully analyse
their robustness so that stakeholders can understand if and when such models
are trustworthy enough to be relied upon in any given application. In this
paper, we highlight a specific vulnerability in popular models, namely
permutation sensitivity in multiple-choice question answering (MCQA).
Specifically, we show empirically that popular models are vulnerable to
adversarial permutation in answer sets for multiple-choice prompting, which is
surprising as models should ideally be as invariant to prompt permutation as
humans are. These vulnerabilities persist across various model sizes, and exist
in very recent language and vision-language models. Code is available at
https://github.com/ys-zong/FoolyourVLLMs.

大语言和视觉语言模型广泛应用于实践中，但由于它们在遵循指令、上下文学习等方面的令人印象深刻的能力，迫切需要仔细分析它们的鲁棒性，以便利益相关者了解这些模型在任何特定应用中是否足够可靠。本文重点介绍了流行模型中的一个特定漏洞，即多项选择题回答中的置换敏感性问题。具体而言，我们实证地表明流行模型在多项选择提示的答案集的对抗置换方面存在漏洞，这是令人惊讶的，因为模型应该和人类一样对提示的置换不变。这些漏洞在各种模型尺寸中仍然存在，并存在于最近的语言和视觉语言模型中。代码可在 https://github.com/ys-zong/FoolyourVLLMs 找到。