Modeling personality is a challenging problem with applications spanning computer games, virtual assistants, online shopping and education. Many techniques have been tried, ranging from neural networks to computational cognitive architectures. However, most approaches rely on examples with hand-crafted features and scenarios. Here, we approach learning a personality by training agents using a Deep Q-Network (DQN) model on rewards based on psychoanalysis, against hand-coded AI in the game of Pong. As a result, we obtain 4 agents, each with its own personality. Then, we define happiness of an agent, which can be seen as a measure of alignment with agent's objective function, and study it when agents play both against hand-coded AI, and against each other. We find that the agents that achieve higher happiness during testing against hand-coded AI, have lower happiness when competing against each other. This suggests that higher happiness in testing is a sign of overfitting in learning to interact with hand-coded AI, and leads to worse performance against agents with different personalities.

该研究采用基于心理分析数据的奖励训练Deep Q-Network模型训练出了四个拥有各自个性的agent，并研究了这些agents之间的互动。结果表明，当agent在与手工编码的AI进行测试时取得更高的幸福感时，与其他个性不同的agents进行竞争时会表现得更差，这表明测试时的高幸福感可能存在过拟合的情况。

幸福追求：智能体社会中的人格学习