Model stealing attack aims to create a substitute model that steals the ability of the victim target model. However, most of the existing methods depend on the full probability outputs from the victim model, which is unavailable in most realistic scenarios. Focusing on the more practical hard-label setting, due to the lack of rich information in the probability prediction, the existing methods suffer from catastrophic performance degradation. Inspired by knowledge distillation, we propose a novel hard-label model stealing method termed \emph{black-box dissector}, which includes a CAM-driven erasing strategy to mine the hidden information in hard labels from the victim model, and a random-erasing-based self-knowledge distillation module utilizing soft labels from substitute model to avoid overfitting and miscalibration caused by hard labels. Extensive experiments on four widely-used datasets consistently show that our method outperforms state-of-the-art methods, with an improvement of at most $9.92\%$. In addition, experiments on real-world APIs further prove the effectiveness of our method. Our method also can invalidate existing defense methods which further demonstrates the practical potential of our methods.

该研究提出了一种新的黑盒模型窃取方法，主要包括基于CAM的擦除策略和基于随机擦除的自知识蒸馏模块，通过从受害者模型中获取潜在的信息容量并使用替代模型的软标签缓解过度拟合问题，最终能够提高模型窃取效果达到最多8.27％的提升，并有望在实际API和防御机制中使用。

黑盒解剖仪：面向基于擦除的硬标记模型窃取攻击