BriefGPT.xyz
May, 2018
使用Seq2Seq模型和Levenshtein距离对混合数据中的音译词进行规范化
Normalization of Transliterated Words in Code-Mixed Data Using Seq2Seq Model & Levenshtein Distance
HTML
PDF
Soumil Mandal, Karthick Nanmaran
TL;DR
本文介绍了一种新型的体系结构,专注于标准化语音打字变体,在某些情况下还可用于反向音译和单词识别,在测试数据上达到了90.27%的准确率,解决了与社交媒体情境相关的困难同时应对语法不一致和拼写变化的挑战。
Abstract
Building tools for
code-mixed data
is rapidly gaining popularity in the
nlp
research community as such data is exponentially rising on social media. Working with
→