Image and video inpainting is a classic problem in computer vision and computer graphics, aiming to fill in the plausible and realistic content in the missing areas of images and videos. With the advance of deep learning, this problem has achieved significant progress recently. The goal of this paper is to comprehensively review the deep learning-based methods for image and video inpainting. Specifically, we sort existing methods into different categories from the perspective of their high-level inpainting pipeline, present different deep learning architectures, including CNN, VAE, GAN, diffusion models, etc., and summarize techniques for module design. We review the training objectives and the common benchmark datasets. We present evaluation metrics for low-level pixel and high-level perceptional similarity, conduct a performance evaluation, and discuss the strengths and weaknesses of representative inpainting methods. We also discuss related real-world applications. Finally, we discuss open challenges and suggest potential future research directions.

通过深度学习，对于图像和视频修复的基于深度学习的方法进行综合评述，并从高水平的修复流程、深度学习架构、模块设计等多个角度进行分类总结。同时，讨论了训练目标、常见基准数据集、评估指标以及各修复方法的优势、局限性及实际应用，并探讨了公开挑战和未来可能的研究方向。

基于深度学习的图像和视频修复研究综述