Oct, 2023

超越准确性:使用IdentityChain评估编程领域的大型语言模型的自一致性

TL;DRCode Large Language Models (Code LLMs) are evaluated for their self-consistency and general accuracy using the IdentityChain framework, which exposes weaknesses in current models and serves as a model debugging tool.