multi-object tracking (MOT) aims to detect and associate all desired objects
across frames. Most methods accomplish the task by explicitly or implicitly
leveraging strong cues (i.e., spatial and appearance information), which
exhibit powerful instance-level discrimination. However, whe