This study introduces the eFontes models for automatic linguistic annotation of Medieval Latin texts, focusing on lemmatization, part-of-speech tagging, and morphological feature determination. Using the Transformers library, these models were trained on Universal Dependencies (UD) corpora and the newly developed eFontes corpus of Polish Medieval Latin. The research evaluates the models' performance, addressing challenges such as orthographic variations and the integration of Latinized vernacular terms. The models achieved high accuracy rates: lemmatization at 92.60%, part-of-speech tagging at 83.29%, and morphological feature determination at 88.57%. The findings underscore the importance of high-quality annotated corpora and propose future enhancements, including extending the models to Named Entity Recognition.

该研究介绍了 eFontes 模型用于中世纪拉丁语文本的自动语言标注，重点是词形还原、词性标注和形态特征确定。使用 Transformers 库，这些模型基于通用依存语料库和新开发的波兰中世纪拉丁语 eFontes 语料库进行训练。研究评估了模型的性能，并解决了诸如正字法变体和拉丁化俗语术语集成等挑战。模型的准确率较高：词形还原为92.60％，词性标注为83.29％，形态特征确定为88.57％。研究结果强调了高质量标注语料库的重要性，并提出了未来的改进方案，包括将模型扩展到命名实体识别。

eFontes. 中世纪拉丁文文本的词性标注与词形还原技术：跨文体调查