AbstractExisting open-vocabulary object detectors typically require a predefined set of categories from users, significantly confining their application scenarios. In this paper, we introduce
detclipv3, a high-performing detector that excels not only at both
→