BriefGPT.xyz
Aug, 2023
Video OWL-ViT:视频中的时间一致性开放世界定位
Video OWL-ViT: Temporally-consistent open-world localization in video
HTML
PDF
Georg Heigold, Matthias Minderer, Alexey Gritsenko, Alex Bewley, Daniel Keysers...
TL;DR
基于OWL-ViT模型,我们通过添加一个Transformer解码器来成功将开放世界模型应用于视频以实现开放世界定位,从而实现了更好的时间一致性和更强的开放世界能力。
Abstract
We present an architecture and a training recipe that adapts pre-trained
open-world image models
to
localization in videos
. Understanding the open visual world (without being constrained by fixed label spaces) is
→