Recent advancement in the capabilities of large language models (LLMs) has triggered a new surge in LLMs' evaluation. Most recent evaluation works tends to evaluate the comprehensive ability of LLMs over series of tasks. However, the deep structure understanding of natural language is rarely explored. In this work, we examine the ability of LLMs to deal with structured semantics on the tasks of question answering with the help of the human-constructed formal language. Specifically, we implement the inter-conversion of natural and formal language through in-context learning of LLMs to verify their ability to understand and generate the structured logical forms. Extensive experiments with models of different sizes and in different formal languages show that today's state-of-the-art LLMs' understanding of the logical forms can approach human level overall, but there still are plenty of room in generating correct logical forms, which suggest that it is more effective to use LLMs to generate more natural language training data to reinforce a small model than directly answering questions with LLMs. Moreover, our results also indicate that models exhibit considerable sensitivity to different formal languages. In general, the formal language with the lower the formalization level, i.e. the more similar it is to natural language, is more LLMs-friendly.

最近大规模语言模型能力的进步引发了对其评估的新浪潮，这篇研究工作通过在自然语言和形式语言之间的相互转换来验证大规模语言模型理解和生成结构化逻辑形式的能力，实验证明现今最先进的大规模语言模型在理解逻辑形式方面整体上接近人类水平，但在生成正确逻辑形式方面仍有改进的空间，使用大规模语言模型生成更自然的语言训练数据以增强小型模型的效果更好，同时结果还表明模型对不同形式语言表现出显著的敏感性，总体而言，形式化程度较低、更接近自然语言的形式语言对大规模语言模型更友好。

通过问答探究语言模型对结构化语义理解和生成的能力