BriefGPT.xyz
May, 2024
可聚合的上下文化词向量用于有效短语挖掘
Span-Aggregatable, Contextualized Word Embeddings for Effective Phrase Mining
HTML
PDF
Eyal Orbach, Lev Haikin, Nelly David, Avi Faizakof
TL;DR
当目标短语位于噪音上下文中时,单个密集向量不足以进行有效的短语检索;因此,我们提出了代表多个子句、连续词语片段的概念,每个片段都有自己的密集向量,并引入了一种修改后的对比损失函数用于鼓励词嵌入具备此属性,并展示了该方法在短语挖掘中的改进效果。
Abstract
dense vector representations
for sentences made significant progress in recent years as can be seen on sentence similarity tasks. Real-world
phrase retrieval applications
, on the other hand, still encounter chall
→