This paper presents a comprehensive comparative analysis of Large Language Models (LLMs) for generation of code documentation. Code documentation is an essential part of the software writing process. The paper evaluates models such as GPT-3.5, GPT-4, Bard, Llama2, and Starchat on various parameters like Accuracy, Completeness, Relevance, Understandability, Readability and Time Taken for different levels of code documentation. Our evaluation employs a checklist-based system to minimize subjectivity, providing a more objective assessment. We find that, barring Starchat, all LLMs consistently outperform the original documentation. Notably, closed-source models GPT-3.5, GPT-4, and Bard exhibit superior performance across various parameters compared to open-source/source-available LLMs, namely LLama 2 and StarChat. Considering the time taken for generation, GPT-4 demonstrated the longest duration, followed by Llama2, Bard, with ChatGPT and Starchat having comparable generation times. Additionally, file level documentation had a considerably worse performance across all parameters (except for time taken) as compared to inline and function level documentation.

本文对大型语言模型（LLMs）进行了全面的代码文档生成比较分析，评估了GPT-3.5、GPT-4、Bard、Llama2和Starchat等模型在准确度、完整性、相关性、可理解性、可读性和代码文档不同级别生成所花费的时间等参数上的表现。除了Starchat以外的所有LLMs一致优于原始文档，值得注意的是，闭源模型GPT-3.5、GPT-4和Bard在各个参数上相比开源/源代码可用的LLMs（包括LLama 2和StarChat）表现更好。就生成时间而言，GPT-4的持续时间最长，其次是Llama2、Bard，ChatGPT和Starchat的生成时间相当，此外，文件级别文档在所有参数（时间除外）上表现明显较差，相比内联和函数级别文档。

大型语言模型在代码文档生成中的比较分析