Large language models (LLMs) have achieved remarkable progress in linguistic tasks, necessitating robust evaluation frameworks to understand their capabilities and limitations. Inspired by Feynman's principle of understanding through creation, we introduce a self-knowledge evaluation framework that is easy to implement, evaluating models on their ability to comprehend and respond to self-generated questions. Our findings, based on testing multiple models across diverse tasks, reveal significant gaps in the model's self-knowledge ability. Further analysis indicates these gaps may be due to misalignment with human attention mechanisms. Additionally, fine-tuning on self-generated math task may enhance the model's math performance, highlighting the potential of the framework for efficient and insightful model evaluation and may also contribute to the improvement of LLMs.

基于 Feynman 的理解通过创造原则，我们引入了一个易于实施的自我认知评估框架，评估模型对自动生成的问题的理解和回应能力。我们的研究发现，在多个任务上测试多个模型后，模型的自我认知能力存在显著差距。进一步分析表明，这些差距可能是由于与人类注意机制的不匹配所导致的。此外，对自动生成的数学任务进行微调可以提高模型的数学性能，突出了该框架在高效和富有洞察力的模型评估方面的潜力，并可能有助于改善大型语言模型。

自我认知评估大型语言模型