In the field of natural language processing, the rapid development of large language model (LLM) has attracted more and more attention. LLMs have shown a high level of creativity in various tasks, but the methods for assessing such creativity are inadequate. The assessment of LLM creativity needs to consider differences from humans, requiring multi-dimensional measurement while balancing accuracy and efficiency. This paper aims to establish an efficient framework for assessing the level of creativity in LLMs. By adapting the modified Torrance Tests of Creative Thinking, the research evaluates the creative performance of various LLMs across 7 tasks, emphasizing 4 criteria including Fluency, Flexibility, Originality, and Elaboration. In this context, we develop a comprehensive dataset of 700 questions for testing and an LLM-based evaluation method. In addition, this study presents a novel analysis of LLMs' responses to diverse prompts and role-play situations. We found that the creativity of LLMs primarily falls short in originality, while excelling in elaboration. Besides, the use of prompts and the role-play settings of the model significantly influence creativity. Additionally, the experimental results also indicate that collaboration among multiple LLMs can enhance originality. Notably, our findings reveal a consensus between human evaluations and LLMs regarding the personality traits that influence creativity. The findings underscore the significant impact of LLM design on creativity and bridges artificial intelligence and human creativity, offering insights into LLMs' creativity and potential applications.

本研究旨在建立一个有效的框架，评估大型语言模型的创造力水平，通过改编Torrance创造性思维测试方法，研究评估了各种任务中的创造性表现，包括流畅性、灵活性、独创性和详尽度等4个标准，并发现大型语言模型在独创性方面表现不足，但在详尽度方面优秀，还揭示了模型的创造力受提示和角色扮演设置的显著影响，多个模型合作也可以增强独创性，此外，人工评估和大型语言模型对创造力的影响存在一致性，强调了大型语言模型设计对创造力的重要影响。

大型语言模型中创造力的评估与理解