Large Language Models (LLMs) constitute a breakthrough state-of-the-art
Artificial Intelligence (AI) technology which is rapidly evolving and promises
to aid in medical diagnosis either by assisting doctors or by simulating a
doctor's workflow in more advanced and complex implementations. In this
technical paper, we outline Cognitive Network Evaluation Toolkit for Medical
Domains (COGNET-MD), which constitutes a novel benchmark for LLM evaluation in
the medical domain. Specifically, we propose a scoring-framework with increased
difficulty to assess the ability of LLMs in interpreting medical text. The
proposed framework is accompanied with a database of Multiple Choice Quizzes
(MCQs). To ensure alignment with current medical trends and enhance safety,
usefulness, and applicability, these MCQs have been constructed in
collaboration with several associated medical experts in various medical
domains and are characterized by varying degrees of difficulty. The current
(first) version of the database includes the medical domains of Psychiatry,
Dentistry, Pulmonology, Dermatology and Endocrinology, but it will be
continuously extended and expanded to include additional medical domains.

大型语言模型 (LLMs) 在医学诊断中具有辅助医生或模拟医生工作流程的能力，本研究提出了一种用于医学领域中 LLM 评估的认知网络评估工具包 (COGNET-MD)，该工具包包含了一个评分框架以提高对 LLMs 解读医学文本的能力，并伴有一套多项选择题数据库，用于与相关医学专家合作构建，以匹配当前医学趋势并增强安全性、实用性和适用性。当前版本的数据库包括精神病学、牙科、肺病学、皮肤科和内分泌学等医学领域，后续将不断扩展增加其他医学领域。