Morphological segmentation for polysynthetic languages is challenging, because a word may consist of many individual morphemes and training data can be extremely scarce. Since neural sequence-to-sequence (seq2seq) models define the state of the art for morphological segmentation in high-resource settings and for (mostly) European languages, we first show that they also obtain competitive performance for Mexican polysynthetic languages in minimal-resource settings. We then propose two novel multi-task training approaches -one with, one without need for external unlabeled resources-, and two corresponding data augmentation methods, improving over the neural baseline for all languages. Finally, we explore cross-lingual transfer as a third way to fortify our neural model and show that we can train one single multi-lingual model for related languages while maintaining comparable or even improved performance, thus reducing the amount of parameters by close to 75%. We provide our morphological segmentation datasets for Mexicanero, Nahuatl, Wixarika and Yorem Nokki for future research.

本研究提出两种新型的多任务训练方法和相应的数据增强方法，应用于Mexican polysynthetic语言的形态分割，提高了神经基线的性能，同时探索了跨语言转移作为第三种加强神经模型的方法，表明在维持可比性甚至性能提高的同时，可以训练一个多语言模型来减少约75%的参数数量，将我们的形态分割数据集提供给Mexicanero，Nahuatl，Wixarika和Yorem Nokki用于未来研究。

针对多合成极低资源语言的神经形态分割模型加强