We propose a framework for semi-automated annotation of video frames where the video is of an object that at any point in time can be labeled as being in one of a finite number of discrete states. A Hidden Markov Model (HMM) is used to model (1) the behavior of the underlying object and (2) the noisy observation of its state through an image processing algorithm. The key insight of this approach is that the annotation of frame-by-frame video can be reduced from a problem of labeling every single image to a problem of detecting a transition between states of the underlying objected being recording on video. The performance of the framework is evaluated on a driver gaze classification dataset composed of 16,000,000 images that were fully annotated over 6,000 hours of direct manual annotation labor. On this dataset, we achieve a 13x reduction in manual annotation for an average accuracy of 99.1% and a 84x reduction for an average accuracy of 91.2%.

提出了一个半自动视频帧注释的框架，可以通过隐马尔可夫模型对每个视频帧进行标记，该模型旨在对底层对象和其图像处理算法的状态进行建模，从而将视频的注释从一个逐帧标记的问题降为检测底层对象状态转换的问题，该方法在司机凝视分类数据集上进行了评估，取得了较高的准确率和大幅减少了手动注释工作量。

大规模视频数据集中离散状态的半自动标注