Casper Hansen, Christian Hansen, Jakob Grue Simonsen, Stephen Alstrup, Christina Lioma
TL;DR提出了一种新的无监督生成语义哈希方法(Ranking based Semantic Hashing,RBSH),它由一个变分部分和一个基于排名的部分组成,能够通过哈希码生成实现文档排序,实验结果显示,相较于传统方法和最新的语义哈希方法,这种方法在不同哈希码长度下均表现得更好,使用的哈希码长度通常减少2-4倍。
Abstract
Fast similarity search is a key component in large-scale information retrieval, where semantic hashing has become a popular strategy for representing documents as binary hash codes. Recent advances in this area h