BriefGPT.xyz
May, 2020
神经机器翻译中的子词分割动态规划编码
Dynamic Programming Encoding for Subword Segmentation in Neural Machine Translation
HTML
PDF
Xuanli He, Gholamreza Haffari, Mohammad Norouzi
TL;DR
本文介绍了一种名为“动态规划编码”的新的分词算法,它使用轻量级混合字符-子词转换器进行动态规划分割,实验结果表明,DPE对于分割输出句子非常有效,并可与BPE dropout结合使用。
Abstract
This paper introduces
dynamic programming encoding
(DPE), a new segmentation algorithm for tokenizing sentences into subword units. We view the
subword segmentation
of output sentences as a latent variable that s
→