利用大型语言模型进行自动化医学问答评估

Sep, 2024

利用大型语言模型进行自动化医学问答评估

Towards Leveraging Large Language Models for Automated Medical Q&A Evaluation

Jack Krolik, Herprit Mahal, Feroz Ahmad, Gaurav Trivedi, Bahador Saket

TL;DR本研究针对医学问答系统中人类评估时间长、成本高的问题，探讨了大型语言模型（LLMs）在自动化评估响应中的潜力。研究表明，LLMs能够可靠地复制人类评估的结果，尽管仍需进一步研究以应对更复杂的问题。

Abstract

This paper explores the potential of using Large Language Models (LLMs) to automate the evaluation of responses in medical Question and Answer (Q\&A) systems, a crucial form of Natural Language Processing. Tradit