关于使用注意力机制分隔单词的难度

Sep, 2021

关于使用注意力机制分隔单词的难度

On the Difficulty of Segmenting Words with Attention

Ramon Sanabria, Hao Tang, Sharon Goldwater

TL;DR在语音领域中，基于注意力机制的序列到序列模型被用于解决诸如语音翻译或语音识别等任务中的词语分割问题。但本研究表明，仅依靠注意力机制是不稳健的，只有在训练数据包含话语标注的情况下才具有可用性。

Abstract

word segmentation, the problem of finding word boundaries in speech, is of interest for a range of tasks. Previous papers have suggested that for sequence-to-sequence models trained on tasks such as speech transl