We propose a method for using a large language model, such as GPT-3, to simulate responses of different humans in a given context. We test our method by attempting to reproduce well-established economic, psycholinguistic, and social experiments. The method requires prompt templates for each experiment. Simulations are run by varying the (hypothetical) subject details such as name and analyzing the text generated by the language model. We validate our methodology by using GPT-3, to show that it is possible to simulate responses of different people and that their responses are consistent with prior human studies from the literature. We find that the distributions generated by larger language models better align with prior experimental results, suggesting a trend that future language models may be used for even more faithful simulations of human responses. Our use of a language model for simulation is contrasted with anthropomorphic views of a language model as having its own behavior.

介绍了一种新的测试方法——图灵实验（TE），用于评估语言模型（如GPT-3）模拟人类行为的能力，设计实现了多项经济学、语言学和社会心理学实验的TE，比较了不同语言模型再现经典实验的表现，揭示了一些语言模型的“超级精度扭曲”问题。

使用大型语言模型模拟多人并复制人类受试研究