BriefGPT.xyz
Nov, 2022
词嵌入相似度计算方式的频率依赖性
The Dependence on Frequency of Word Embedding Similarity Measures
HTML
PDF
Francisco Valentini, Diego Fernandez Slezak, Edgar Altszyler
TL;DR
本文系统研究了几种静态词向量嵌入中单词频率与语义相似性之间的关联,并发现高频单词之间的相似性更高。同时,本文还探究了单词频率对基于嵌入的性别偏见测量的影响,并证明通过操纵单词频率可使偏见发生倒转。
Abstract
Recent research has shown that static
word embeddings
can encode word
frequency
information. However, little has been studied about this phenomenon and its effects on
→