Deep neural networks (DNNs) have been widely used in many artificial intelligence (AI) tasks. However, deploying them brings significant challenges due to the huge cost of memory, energy, and computation. To address these challenges, researchers have developed various model compression techniques such as model quantization and model pruning. Recently, there has been a surge in research of compression methods to achieve model efficiency while retaining the performance. Furthermore, more and more works focus on customizing the DNN hardware accelerators to better leverage the model compression techniques. In addition to efficiency, preserving security and privacy is critical for deploying DNNs. However, the vast and diverse body of related works can be overwhelming. This inspires us to conduct a comprehensive survey on recent research toward the goal of high-performance, cost-efficient, and safe deployment of DNNs. Our survey first covers the mainstream model compression techniques such as model quantization, model pruning, knowledge distillation, and optimizations of non-linear operations. We then introduce recent advances in designing hardware accelerators that can adapt to efficient model compression approaches. Additionally, we discuss how homomorphic encryption can be integrated to secure DNN deployment. Finally, we discuss several issues, such as hardware evaluation, generalization, and integration of various compression approaches. Overall, we aim to provide a big picture of efficient DNNs, from algorithm to hardware accelerators and security perspectives.

深度神经网络（DNNs）在许多人工智能（AI）任务中被广泛使用，为了解决其部署带来的巨大的内存、能量和计算成本挑战，研究人员开发了各种模型压缩技术，最近还有越来越多的研究关注定制化DNN硬件加速器以更好地利用模型压缩技术，此外，保护安全和隐私对于部署DNNs至关重要，我们的研究综述首先涵盖主流的模型压缩技术，如模型量化、模型修剪、知识蒸馏和非线性运算优化，然后介绍了设计可以适应高效模型压缩方法的硬件加速器的最新进展，此外，我们还讨论了如何将同态加密集成到安全DNN部署中，最后，我们讨论了硬件评估、泛化和各种压缩方法的集成等几个问题，总体来说，我们旨在从算法、硬件加速器和安全性角度提供高效DNN的整体概况。

从算法到硬件：深度神经网络高效安全部署综述