With the increasing application of Large Language Models (LLMs) in the medical domain, evaluating these models' performance using benchmark datasets has become crucial. This paper presents a comprehensive survey of various benchmark datasets employed in medical LLM tasks. These dataset