Historical languages present unique challenges to the NLP community, with one prominent hurdle being the limited resources available in their closed corpora. This work describes our submission to the constrained subtask of the SIGTYP 2024 shared task, focusing on PoS tagging, morphological tagging, and lemmatization for 13 historical languages. For PoS and morphological tagging we adapt a hierarchical tokenization method from Sun et al. (2023) and combine it with the advantages of the DeBERTa-V3 architecture, enabling our models to efficiently learn from every character in the training data. We also demonstrate the effectiveness of character-level T5 models on the lemmatization task. Pre-trained from scratch with limited data, our models achieved first place in the constrained subtask, nearly reaching the performance levels of the unconstrained task's winner. Our code is available at https://github.com/bowphs/SIGTYP-2024-hierarchical-transformers

历史语言中的NLP社区所面临的主要挑战之一是其封闭语料库中有限的资源。本研究描述了我们参与SIGTYP 2024共享任务约束子任务的提交，重点关注13种历史语言的词性标注、形态标注和词形还原。我们采用了Sun等人（2023年）的分层分词方法，并结合DeBERTa-V3架构的优势，使我们的模型能够有效地从训练数据的每个字符中学习。我们还展示了字符级T5模型在词形还原任务中的有效性。我们的模型通过有限的数据从头开始预训练，并在约束子任务中获得了第一名，几乎达到了无约束任务的冠军水平。我们的代码可在此https URL找到。

海德堡 - 波士顿 @ SIGTYP 2024 共享任务：使用字符感知分层变形器增强低资源语言分析