Large language models (LLMs) have ushered in a new era for processing complex information in various fields, including science. The increasing amount of scientific literature allows these models to acquire and understand scientific knowledge effectively, thus improving their performance in a wide range of tasks. Due to the power of LLMs, they require extremely expensive computational resources, intense amounts of data, and training time. Therefore, in recent years, researchers have proposed various methodologies to make scientific LLMs more affordable. The most well-known approaches align in two directions. It can be either focusing on the size of the models or enhancing the quality of data. To date, a comprehensive review of these two families of methods has not yet been undertaken. In this paper, we (I) summarize the current advances in the emerging abilities of LLMs into more accessible AI solutions for science, and (II) investigate the challenges and opportunities of developing affordable solutions for scientific domains using LLMs.

本研究解决了大型语言模型在科学领域应用时面临的高昂计算资源和训练时间的问题。通过对现有方法的总结与分析，本文提出了两条主要的研究方向，即模型规模和数据质量的提升。研究表明，这些方法的综合应用能够显著降低科学领域内使用大型语言模型的成本，推动更可负担的AI解决方案的发展。

面向科学文本的高效大型语言模型：综述