Most modern neural machine translation (NMT) systems rely on presegmented inputs. Segmentation granularity importantly determines the input and output sequence lengths, hence the modeling depth, and source and target vocabularies, which in turn determine model size, computational costs of softmax normalization, and handling of out-of-vocabulary words. However, the current practice is to use static, heuristic-based segmentations that are fixed before NMT training. This begs the question whether the chosen segmentation is optimal for the translation task. To overcome suboptimal segmentation choices, we present an algorithm for dynamic segmentation based on the Adaptative Computation Time algorithm (Graves 2016), that is trainable end-to-end and driven by the NMT objective. In an evaluation on three translation tasks we found that, given the freedom to navigate between different segmentation levels, the model prefers to operate on (almost) character level, providing support for purely character-level NMT models from a novel angle.

提出了一种基于自适应计算时间算法的动态分词算法，该算法可通过端到端的训练驱动，并可在不同的分词级别之间进行自由导航。在四个翻译任务的评估中，发现模型更喜欢在几乎字符级别上运行，从一种新的角度支持了纯字符级NMT模型。

学习将输入分段有利于基于字符级处理的神经机器翻译