Large Language Models (LLMs) have demonstrated remarkable proficiency in human interactions, yet their application within the medical field remains insufficiently explored. Previous works mainly focus on the performance of medical knowledge with examinations, which is far from the realistic scenarios, falling short in assessing the abilities of LLMs on clinical tasks. In the quest to enhance the application of Large Language Models (LLMs) in healthcare, this paper introduces the Automated Interactive Evaluation (AIE) framework and the State-Aware Patient Simulator (SAPS), targeting the gap between traditional LLM evaluations and the nuanced demands of clinical practice. Unlike prior methods that rely on static medical knowledge assessments, AIE and SAPS provide a dynamic, realistic platform for assessing LLMs through multi-turn doctor-patient simulations. This approach offers a closer approximation to real clinical scenarios and allows for a detailed analysis of LLM behaviors in response to complex patient interactions. Our extensive experimental validation demonstrates the effectiveness of the AIE framework, with outcomes that align well with human evaluations, underscoring its potential to revolutionize medical LLM testing for improved healthcare delivery.

通过引入自动交互评估（AIE）框架和状态感知患者模拟器（SAPS），本文介绍了一种增强大型语言模型（LLMs）在医疗保健领域应用的方法，通过多轮医患模拟来评估LLMs的性能，从而更好地满足临床实践的需求。实验证明了AIE框架的有效性，并与人类评估结果一致，强调其改进医疗保健交付的潜力。

使用具有状态感知病人模拟器的自动交互式评估大型语言模型