Research into methods for improving the performance of large language models (LLMs) through fine-tuning, retrieval-augmented generation (RAG) and soft-prompting has tended to focus on the use of highly technical or high-cost techniques, making many of the newly discovered approaches comparatively inaccessible to non-technical users. In this paper we tested an unmodified version of GPT 3.5, a fine-tuned version, and the same unmodified model when given access to a vectorised RAG database, both in isolation and in combination with a basic, non-algorithmic soft prompt. In each case we tested the model's ability to answer a set of 100 questions relating primarily to events that occurred after September 2021 (the point at which GPT 3.5's training data set ends). We found that if commercial platforms are used and default settings are applied with no iteration in order to establish a baseline set of outputs, a fine-tuned model outperforms GPT 3.5 Turbo, while the RAG approach out-performed both. The application of a soft prompt significantly improved the performance of each approach.

通过细调、检索增强生成（RAG）和软提示等方法提高大型语言模型（LLMs）的性能的研究一般侧重于使用高度技术性或高成本的技术，使许多新发现的方法对非技术用户相对不可访问。在本文中，我们测试了未修改版本的GPT 3.5，经过细调的版本，以及相同的未修改模型在访问矢量化的RAG数据库时，单独或与基本的非算法软提示相结合。每种情况下，我们测试了模型回答一组100个与2021年9月之后（GPT 3.5的训练数据集结束的时间点）相关的事件问题的能力。我们发现，如果使用商业平台并应用默认设置以建立基准输出集，经过细调的模型优于GPT 3.5 Turbo，而RAG方法优于两者。应用软提示显著提高了每种方法的性能。

建立非专业化LLM用户的微调、增强检索生成和软提示的性能基准