Large language models (LLMs) have demonstrated impressive capabilities in
natural language processing. However, their internal mechanisms are still
unclear and this lack of transparency poses unwanted risks for downstream
applications. Therefore, understanding and explaining these models is crucial
for elucidating their behaviors, limitations, and social impacts. In this
paper, we introduce a taxonomy of explainability techniques and provide a
structured overview of methods for explaining Transformer-based language
models. We categorize techniques based on the training paradigms of LLMs:
traditional fine-tuning-based paradigm and prompting-based paradigm. For each
paradigm, we summarize the goals and dominant approaches for generating local
explanations of individual predictions and global explanations of overall model
knowledge. We also discuss metrics for evaluating generated explanations, and
discuss how explanations can be leveraged to debug models and improve
performance. Lastly, we examine key challenges and emerging opportunities for
explanation techniques in the era of LLMs in comparison to conventional machine
learning models.

对大型语言模型的解释技术进行分类和总结，并讨论了这些技术在训练范式、生成局部解释和全局解释方面的应用以及评估指标、调试模型和提高性能的挑战和机会。

大型语言模型的可解释性概述

Explainability for Large Language Models: A Survey

Language modeling studies the probability distributions over strings of
texts. It is one of the most fundamental tasks in natural language processing
(NLP). It has been widely used in text generation, speech recognition, machine
translation, etc. Conventional language models (CLMs) aim to predict the
probability of linguistic sequences in a causal manner. In contrast,
pre-trained language models (PLMs) cover broader concepts and can be used in
both causal sequential modeling and fine-tuning for downstream applications.
PLMs have their own training paradigms (usually self-supervised) and serve as
foundation models in modern NLP systems. This overview paper provides an
introduction to both CLMs and PLMs from five aspects, i.e., linguistic units,
structures, training methods, evaluation methods, and applications.
Furthermore, we discuss the relationship between CLMs and PLMs and shed light
on the future directions of language modeling in the pre-trained era.

本文从语言单位、结构、训练方法、评估方法和应用等五个方面，介绍了传统语言模型和预训练语言模型，讨论了二者的关系和语言模型在预训练时代的未来发展方向。

语言模型概述：最新发展与展望

An Overview on Language Models: Recent Developments and Outlook

In learning action recognition, models are typically pre-trained on object
recognition with images, such as ImageNet, and later fine-tuned on target
action recognition with videos. This approach has achieved good empirical
performance especially with recent transformer-based video architectures. While
recently many works aim to design more advanced transformer architectures for
action recognition, less effort has been made on how to train video
transformers. In this work, we explore several training paradigms and present
two findings. First, video transformers benefit from joint training on diverse
video datasets and label spaces (e.g., Kinetics is appearance-focused while
SomethingSomething is motion-focused). Second, by further co-training with
images (as single-frame videos), the video transformers learn even better video
representations. We term this approach as Co-training Videos and Images for
Action Recognition (CoVeR). In particular, when pretrained on ImageNet-21K
based on the TimeSFormer architecture, CoVeR improves Kinetics-400 Top-1
Accuracy by 2.4%, Kinetics-600 by 2.3%, and SomethingSomething-v2 by 2.3%. When
pretrained on larger-scale image datasets following previous state-of-the-art,
CoVeR achieves best results on Kinetics-400 (87.2%), Kinetics-600 (87.9%),
Kinetics-700 (79.8%), SomethingSomething-v2 (70.9%), and Moments-in-Time
(46.1%), with a simple spatio-temporal video transformer.

本文探索视频 transformer 的几个训练范式，并提出了一种称为 CoVeR 的方法，通过与图像的共同训练，提高了视频 transformer 的性能，尤其是在动作识别上，取得了最好的结果。

使用视频和图像联合训练 Transformer 提高动作识别

Co-training Transformer with Videos and Images Improves Action  Recognition

In order to obtain reliable accuracy estimates for automatic MOOC dropout
predictors, it is important to train and test them in a manner consistent with
how they will be used in practice. Yet most prior research on MOOC dropout
prediction has measured test accuracy on the same course used for training the
classifier, which can lead to overly optimistic accuracy estimates. In order to
understand better how accuracy is affected by the training+testing regime, we
compared the accuracy of a standard dropout prediction architecture
(clickstream features + logistic regression) across 4 different training
paradigms. Results suggest that (1) training and testing on the same course
("post-hoc") can overestimate accuracy by several percentage points; (2)
dropout classifiers trained on proxy labels based on students' persistence are
surprisingly competitive with post-hoc training (87.33% versus 90.20% AUC
averaged over 8 weeks of 40 HarvardX MOOCs); and (3) classifier performance
does not vary significantly with the academic discipline. Finally, we also
research new dropout prediction architectures based on deep, fully-connected,
feed-forward neural networks and find that (4) networks with as many as 5
hidden layers can statistically significantly increase test accuracy over that
of logistic regression.

本研究分析了自动化 MOOC 退课预测器的准确性估计，并比较了标准的退课预测架构在四种不同的训练范式下的准确性，结果表明，在真实情境中，“事后” 训练和测试可能会高估准确度，而基于学生持续性的代理标签训练的退学分类器也能与基于事后训练的预测器竞争，并且分类器性能不随学科而变化。最后，基于深度全连接前馈神经网络研究新的退课预测体系结构，发现比逻辑回归有更高的测试准确率。