BriefGPT.xyz
Dec, 2022
基于子词分割的下采样在字级别翻译中的应用
Subword-Delimited Downsampling for Better Character-Level Translation
HTML
PDF
Lukas Edman, Antonio Toral, Gertjan van Noord
TL;DR
通过引入一种新的信息量更大的降采样方法,将字符水平的机器学习模型在机器翻译领域中的表现提高到与子词水平的模型接近。
Abstract
subword-level models
have been the dominant paradigm in
nlp
. However,
character-level models
have the benefit of seeing each character ind
→