Artificial intelligence systems are prevalent in everyday life, with use cases in retail, manufacturing, health, and many other fields. With the rise in AI adoption, associated risks have been identified, including privacy risks to the people whose data was used to train models. Assessing the privacy risks of machine learning models is crucial to enabling knowledgeable decisions on whether to use, deploy, or share a model. A common approach to privacy risk assessment is to run one or more known attacks against the model and measure their success rate. We present a novel framework for running membership inference attacks against classification models. Our framework takes advantage of the ensemble method, generating many specialized attack models for different subsets of the data. We show that this approach achieves higher accuracy than either a single attack model or an attack model per class label, both on classical and language classification tasks.

人工智能系统在日常生活中普遍存在，在零售、制造、健康等许多领域都有应用。随着人工智能采用的增加，相关风险也被识别出来，其中包括对用于训练模型的数据的隐私风险。评估机器学习模型的隐私风险对于做出有知识决策，是否使用、部署或共享模型至关重要。对隐私风险评估的常见方法是运行一个或多个已知的攻击来评估攻击的成功率。我们提出了一个新颖的框架来运行针对分类模型的成员推理攻击。我们的框架利用集合方法，针对数据的不同子集生成许多专门的攻击模型。我们证明这种方法在经典和语言分类任务中比单个攻击模型或每个类标签的攻击模型都具有更高的准确性。

对语言分类模型的成员推断攻击的改进