Large Language Models (LLMs) pretrained on massive corpora exhibit remarkable capabilities across a wide range of tasks, however, the attention given to non-English languages has been limited in this field of research. To address this gap and assess the proficiency of language models in the Korean language and culture, we present HAE-RAE Bench, covering 6 tasks including vocabulary, history, and general knowledge. Our evaluation of language models on this benchmark highlights the potential advantages of employing Large Language-Specific Models(LLSMs) over a comprehensive, universal model like GPT-3.5. Remarkably, our study reveals that models approximately 13 times smaller than GPT-3.5 can exhibit similar performance levels in terms of language-specific knowledge retrieval. This observation underscores the importance of homogeneous corpora for training professional-level language-specific models. On the contrary, we also observe a perplexing performance dip in these smaller LMs when they are tasked to generate structured answers.

通过在HAE-RAE评估中发现，相较于全面的通用模型GPT-3.5，大规模的语言特定模型(LLSMs)在语言特定知识检索方面展现出类似的性能水平，强调了使用同质化语料库来训练专业级语言特定模型的重要性，但较小的LLMs在生成结构化回答时表现出令人困惑的性能下降。

HAE-RAE Bench：韩国知识在语言模型中的评估