BriefGPT.xyz
Apr, 2019
基于次模优化的类别特征压缩
Categorical Feature Compression via Submodular Optimization
HTML
PDF
MohammadHossein Bateni, Lin Chen, Hossein Esfandiari, Thomas Fu, Vahab S. Mirrokni...
TL;DR
基于大数据情境下,我们设计了一种高度可扩展的词汇压缩算法,旨在最大化压缩分类特征和目标二进制标签之间的信息量,并通过分布式实现一系列复杂的算法来保证算法的准确性。
Abstract
In the era of
big data
, learning from categorical features with very large vocabularies (e.g., 28 million for the Criteo click prediction dataset) has become a practical challenge for
machine learning
researchers
→