BriefGPT.xyz
Jul, 2014
MinHash 比 SimHash 更具优势的防御能力
In Defense of MinHash Over SimHash
HTML
PDF
Anshumali Shrivastava, Ping Li
TL;DR
本研究探讨了在大规模数据处理应用中,MinHash和SimHash是两种广泛采用的局部敏感哈希算法。研究表明,当数据为二进制时,MinHash几乎总是优于SimHash;本研究还提供了基于相似性和余弦相似性的算法比较方法。
Abstract
minhash
and
simhash
are the two widely adopted
locality sensitive hashing
(LSH) algorithms for large-scale data processing applications. D
→