Puns play a vital role in academic research due to their distinct structure and clear definition, which aid in the comprehensive analysis of linguistic humor. However, the understanding of puns in large language models (LLMs) has not been thoroughly examined, limiting their use in creative writing and humor creation. In this paper, we leverage three popular tasks, i.e., pun recognition, explanation and generation to systematically evaluate the capabilities of LLMs in pun understanding. In addition to adopting the automated evaluation metrics from prior research, we introduce new evaluation methods and metrics that are better suited to the in-context learning paradigm of LLMs. These new metrics offer a more rigorous assessment of an LLM's ability to understand puns and align more closely with human cognition than previous metrics. Our findings reveal the "lazy pun generation" pattern and identify the primary challenges LLMs encounter in understanding puns.

本论文通过采用三个主要任务，即识别、解释和生成双关语，系统评估了大型语言模型在双关语理解方面的能力，新的评估方法和指标更加贴近人类认知，发现了“懒散双关语生成”模式以及大型语言模型在双关语理解中遇到的主要挑战。

一句好的俏皮话的妙趣自成良词：大型语言模型是否能理解俏皮话？