We introduce EQ-Bench, a novel benchmark designed to evaluate aspects of emotional intelligence in Large Language Models (LLMs). We assess the ability of LLMs to understand complex emotions and social interactions by asking them to predict the intensity of emotional states of characters in a dialogue. The benchmark is able to discriminate effectively between a wide range of models. We find that EQ-Bench correlates strongly with comprehensive multi-domain benchmarks like MMLU (Hendrycks et al., 2020) (r=0.97), indicating that we may be capturing similar aspects of broad intelligence. Our benchmark produces highly repeatable results using a set of 60 English-language questions. We also provide open-source code for an automated benchmarking pipeline at https://github.com/EQ-bench/EQ-Bench and a leaderboard at https://www.eqbench.com

我们介绍了EQ-Bench，这是一个旨在评估大型语言模型（LLM）中情绪智能方面的新型基准。我们通过要求LLMs预测对话中角色的情绪状态的强度来评估LLMs理解复杂情绪和社交互动的能力。该基准能够有效地区分多种模型，与综合多领域基准（如MMLU）强相关（r=0.97），这表明我们可能捕捉到了广泛智能的类似方面。我们的基准使用60个英语问题集生成高度可重复的结果。我们还提供了一个自动化基准测试流水线的开源代码以及一个排行榜。

EQ-Bench: 大型语言模型的情绪智能基准