We present a video decomposition method that facilitates layer-based editing of videos with spatiotemporally varying lighting and motion effects. Our neural model decomposes an input video into multiple layered representations, each comprising a 2D texture map, a mask for the original video, and a multiplicative residual characterizing the spatiotemporal variations in lighting conditions. A single edit on the texture maps can be propagated to the corresponding locations in the entire video frames while preserving other contents' consistencies. Our method efficiently learns the layer-based neural representations of a 1080p video in 25s per frame via coordinate hashing and allows real-time rendering of the edited result at 71 fps on a single GPU. Qualitatively, we run our method on various videos to show its effectiveness in generating high-quality editing effects. Quantitatively, we propose to adopt feature-tracking evaluation metrics for objectively assessing the consistency of video editing. Project page: https://lightbulb12294.github.io/hashing-nvd/

我们提出了一种视频分解方法，可以对具有时空变化照明和运动效果的视频进行基于层次的编辑。我们的神经模型将输入视频分解为多个分层表示，其中包括2D纹理贴图、原始视频的掩码以及表征光照条件时空变化的乘法残差。通过对纹理贴图进行单个编辑，可以在整个视频帧的对应位置传播并保持其他内容的一致性。通过坐标哈希，我们的方法可以以每帧25秒的速度高效学习1080p视频的基于层次的神经表示，并在单个GPU上以71帧每秒的速度实时渲染编辑结果。在定性方面，我们运行该方法在各种视频上展示其生成高质量编辑效果的有效性。在定量方面，我们提出采用特征跟踪评估指标来客观评估视频编辑的一致性。

空时中的乘法残差哈希神经视频分解