The strategic significance of Large Language Models (LLMs) in economic expansion, innovation, societal development, and national security has been increasingly recognized since the advent of ChatGPT. This study provides a comprehensive comparative evaluation of American and Chinese LLMs in both English and Chinese contexts. We proposed a comprehensive evaluation framework that encompasses natural language proficiency, disciplinary expertise, and safety and responsibility, and systematically assessed 16 prominent models from the US and China under various operational tasks and scenarios. Our key findings show that GPT 4-Turbo is at the forefront in English contexts, whereas Ernie-Bot 4 stands out in Chinese contexts. The study also highlights disparities in LLM performance across languages and tasks, stressing the necessity for linguistically and culturally nuanced model development. The complementary strengths of American and Chinese LLMs point to the value of Sino-US collaboration in advancing LLM technology. The research presents the current LLM competition landscape and offers valuable insights for policymakers and businesses regarding strategic LLM investments and development. Future work will expand on this framework to include emerging LLM multimodal capabilities and business application assessments.

该研究对中美两国大型语言模型在英语和中文环境下进行了全面比较评价，发现GPT 4-Turbo在英语环境中居于领先地位，而Ernie-Bot 4在中文环境中表现出色。研究强调了语言和任务差异对大型语言模型性能的影响，强调在模型开发中的语言和文化细微差别的重要性，并指出中美大型语言模型的相互补充性，强调了中美之间在推进大型语言模型技术方面的合作价值。该研究还为政策制定者和企业关于战略性大型语言模型投资和发展提供了有价值的见解，并展望了未来的研究方向，包括多模态能力和商业应用方面的评估。

揭示竞争动态：美国和中国LLM的比较评估