Natural Language Processing has moved rather quickly from modelling specific tasks to taking more general pre-trained models and fine-tuning them for specific tasks, to a point where we now have what appear to be inherently generalist models. This paper argues that the resultant loss of clarity on what these models model leads to metaphors like "artificial general intelligences" that are not helpful for evaluating their strengths and weaknesses. The proposal is to see their generality, and their potential value, in their ability to approximate specialist function, based on a natural language specification. This framing brings to the fore questions of the quality of the approximation, but beyond that, also questions of discoverability, stability, and protectability of these functions. As the paper will show, this framing hence brings together in one conceptual framework various aspects of evaluation, both from a practical and a theoretical perspective, as well as questions often relegated to a secondary status (such as "prompt injection" and "jailbreaking").

自然语言处理从建模特定任务快速转向使用更一般的预训练模型，并将其微调为特定任务，现在我们似乎拥有了本质上具有广义模型特性的模型。本文认为这种模型模型失去了清晰度，导致了与其评估优点和缺点无关的类似“人工通用智能”的隐喻，因此建议从其近似专家功能的能力出发看待其广义性和潜在价值。这种视角引发了关于近似质量的问题，以及发现性、稳定性和保护性的问题。如本文所示，这种框架将实践和理论视角的各个方面以及常常被次要化的问题（如“提示注入”和“越狱”）融合在一个概念框架中。

LLMs作为函数逼近器：术语、分类和评估问题