Dense vector representations for sentences made significant progress in recent years as can be seen on sentence similarity tasks. Real-world phrase retrieval applications, on the other hand, still encounter challenges for effective use of dense representations. We show that when target phrases reside inside noisy context, representing the full sentence with a single dense vector, is not sufficient for effective phrase retrieval. We therefore look into the notion of representing multiple, sub-sentence, consecutive word spans, each with its own dense vector. We show that this technique is much more effective for phrase mining, yet requires considerable compute to obtain useful span representations. Accordingly, we make an argument for contextualized word/token embeddings that can be aggregated for arbitrary word spans while maintaining the span's semantic meaning. We introduce a modification to the common contrastive loss used for sentence embeddings that encourages word embeddings to have this property. To demonstrate the effect of this method we present a dataset based on the STS-B dataset with additional generated text, that requires finding the best matching paraphrase residing in a larger context and report the degree of similarity to the origin phrase. We demonstrate on this dataset, how our proposed method can achieve better results without significant increase to compute.

当目标短语位于噪音上下文中时，单个密集向量不足以进行有效的短语检索；因此，我们提出了代表多个子句、连续词语片段的概念，每个片段都有自己的密集向量，并引入了一种修改后的对比损失函数用于鼓励词嵌入具备此属性，并展示了该方法在短语挖掘中的改进效果。

可聚合的上下文化词向量用于有效短语挖掘