BriefGPT.xyz
May, 2018
大规模数据集的联想分类扩展
Scaling associative classification for very large datasets
HTML
PDF
Luca Venturini, Elena Baralis, Paolo Garza
TL;DR
介绍了一种分布式关联分类器 (DAC) 用于解决分类器难以处理大型数据集和大型域类别特征的问题,采用集成学习和先进的技巧以实现高可扩展性和高准确性,基于 Apache Spark 框架进行验证,证明 DAC 在分类预测品质和执行时间方面均优于最先进的解决方案。
Abstract
supervised learning
algorithms are nowadays successfully scaling up to datasets that are very large in volume, leveraging the potential of in-memory cluster-computing
big data
frameworks. Still, massive datasets
→