The success of ChatGPT has ignited an AI race, with researchers striving to develop new large language models (LLMs) that can match or surpass the language understanding and generation abilities of commercial ones. In recent times, a number of models have emerged, claiming performance near that of GPT-3.5 or GPT-4 through various instruction-tuning methods. As practitioners of Text-to-SQL parsing, we are grateful for their valuable contributions to open-source research. However, it is important to approach these claims with a sense of scrutiny and ascertain the actual effectiveness of these models. Therefore, we pit six popular large language models against each other, systematically evaluating their Text-to-SQL parsing capability on nine benchmark datasets with five different prompting strategies, covering both zero-shot and few-shot scenarios. Regrettably, the open-sourced models fell significantly short of the performance achieved by closed-source models like GPT-3.5, highlighting the need for further work to bridge the performance gap between these models.

通过系统评估六种主流大型语言模型在九个基准数据集上的文本到SQL解析能力，发现这些开源模型在性能上明显不及GPT-3.5等闭源模型，强调了填补这些模型之间性能差距的进一步工作的需求。

大型语言模型之战: Dolly对LLaMA对Vicuna对Guanaco对Bard对ChatGPT -- 文本到SQL解析比较