Recent work shows Large Language Models (LLMs) struggle to understand natural language constraints for various text generation tasks in zero- and few-shot settings. While, in the code domain, there is wide usage of constraints in code format to maintain the integrity of code written in Domain-Specific Languages (DSLs), yet there has been no work evaluating LLMs with these constraints. We propose two novel tasks to assess the controllability of LLMs using hard and soft constraints represented as code across five representations. Our findings suggest that LLMs struggle to comprehend constraints in all representations irrespective of their portions in the pre-training data. While models are better at comprehending constraints in JSON, YAML, and natural language representations, they struggle with constraints represented in XML and the resource-rich language Python.

使用硬约束和软约束作为代码在五种不同表示形式下来评估大型语言模型的可控性，研究发现无论是在预训练数据中的占比如何，大型语言模型都难以理解所有表示形式中的约束，尤其对XML和资源丰富的Python这两种表示形式的约束理解较差。

ConCodeEval：评估领域特定语言中大型语言模型对代码约束的性能