BriefGPT.xyz
Oct, 2022
如何在数据污染的情况下筛选出干净的数据子集?
How to Sift Out a Clean Data Subset in the Presence of Data Poisoning?
HTML
PDF
Yi Zeng, Minzhou Pan, Himanshu Jahagirdar, Ming Jin, Lingjuan Lyu...
TL;DR
该论文研究了机器学习模型在使用外部数据进行训练时,被恶意攻击者操纵数据污染的问题,并提出了一种名为‘Meta-Sift’的实用对策,该对策基于对数据中的干净数据分布的判断,可以从毒化数据中高精度地筛选出干净的基础数据集。
Abstract
Given the volume of data needed to train modern
machine learning models
, external suppliers are increasingly used. However, incorporating external data poses
data poisoning
risks, wherein attackers manipulate the
→