With the expansion of neural networks, such as large language models, humanity is exponentially heading towards superintelligence. As various AI systems are increasingly integrated into the fabric of societies-through recommending values, devising creative solutions, and making decisions-it becomes critical to assess how these AI systems impact humans in the long run. This research aims to contribute towards establishing a benchmark for evaluating the sentiment of various Large Language Models in socially importan issues. The methodology adopted was a Likert scale survey. Seven LLMs, including GPT-4 and Bard, were analyzed and compared against sentiment data from three independent human sample populations. Temporal variations in sentiment were also evaluated over three consecutive days. The results highlighted a diversity in sentiment scores among LLMs, ranging from 3.32 to 4.12 out of 5. GPT-4 recorded the most positive sentiment score towards AGI, whereas Bard was leaning towards the neutral sentiment. The human samples, contrastingly, showed a lower average sentiment of 2.97. The temporal comparison revealed differences in sentiment evolution between LLMs in three days, ranging from 1.03% to 8.21%. The study's analysis outlines the prospect of potential conflicts of interest and bias possibilities in LLMs' sentiment formation. Results indicate that LLMs, akin to human cognitive processes, could potentially develop unique sentiments and subtly influence societies' perceptions towards various opinions formed within the LLMs.

本研究旨在建立一个评估大型语言模型在社会重要问题上的情感基准，以填补对AI系统对人类长期影响的研究空白。采用Likert量表调查，对包括GPT-4和Bard在内的七个大型语言模型进行分析，并与三个人类样本的情感数据进行了比较。研究发现，LLMs的情感得分存在显著差异，而GPT-4在对AGI的情感上表现出最积极的态度，反映出LLMs在情感形成中可能存在的利益冲突和偏见问题。

Towards New Benchmark for AI Alignment & Sentiment Analysis in Socially
  Important Issues: A Comparative Study of Human and LLMs in the Context of AGI

面向人工智能对齐和情感分析的新基准：在AGI背景下人类与大型语言模型的比较研究