The reliance of popular programming languages such as Python and JavaScript on centralized package repositories and open-source software, combined with the emergence of code-generating Large Language Models (LLMs), has created a new type of threat to the software supply chain: package hallucinations. These hallucinations, which arise from fact-conflicting errors when generating code using LLMs, represent a novel form of package confusion attack that poses a critical threat to the integrity of the software supply chain. This paper conducts a rigorous and comprehensive evaluation of package hallucinations across different programming languages, settings, and parameters, exploring how different configurations of LLMs affect the likelihood of generating erroneous package recommendations and identifying the root causes of this phenomena. Using 16 different popular code generation models, across two programming languages and two unique prompt datasets, we collect 576,000 code samples which we analyze for package hallucinations. Our findings reveal that 19.7% of generated packages across all the tested LLMs are hallucinated, including a staggering 205,474 unique examples of hallucinated package names, further underscoring the severity and pervasiveness of this threat. We also implemented and evaluated mitigation strategies based on Retrieval Augmented Generation (RAG), self-detected feedback, and supervised fine-tuning. These techniques demonstrably reduced package hallucinations, with hallucination rates for one model dropping below 3%. While the mitigation efforts were effective in reducing hallucination rates, our study reveals that package hallucinations are a systemic and persistent phenomenon that pose a significant challenge for code generating LLMs.

该研究通过对不同编程语言、设置和参数的LLMs配置进行严格全面的评估，探索了不同LLMs配置如何影响生成错误软件包建议的可能性，并识别了这种现象的根本原因。结果表明，所有经过测试的LLMs中有19.7%的生成软件包是产生幻觉的，并且幻觉软件包名的数量达到了205,474个，进一步凸显了这一威胁的严重性和普遍性。同时，实施的缓解策略明显降低了软件包幻觉发生的频率，其中一个模型的幻觉率降低到了3%以下。然而，研究表明软件包幻觉是一个系统性和持久性的现象，给代码生成的LLMs带来了重大挑战。

代码生成的LLM对包装迷思的综合分析