Qiming Bao, Gael Gendron, Alex Yuxuan Peng, Wanjun Zhong, Neset Tan...
TL;DR大型语言模型在逻辑推理任务中的泛化性和鲁棒性评估及改善的研究。
Abstract
large language models (LLMs), such as GPT-3.5 and GPT-4, have greatly advanced the performance of artificial systems on various natural language processing tasks to human-like levels. However, their generalisation