双向神经语言模型下的无监督分词

Mar, 2021

双向神经语言模型下的无监督分词

Unsupervised Word Segmentation with Bi-directional Neural Language Model

Lihao Wang, Zongyi Li, Xiaoqing Zheng

TL;DR本文介绍一种基于上下文敏感的无监督词分割模型，使用双向神经语言模型和两种解码算法来增强长期和短期的相关性，该模型在不同的数据集上实现了最新技术水平的中文和泰文词分割结果。

Abstract

We present an unsupervised word segmentation model, in which the learning objective is to maximize the generation probability of a sentence given its all possible segmentation. Such generation probability can be factorized into the likelihood of each possible segment given the context