Detecting and segmenting individual objects, regardless of their category, is
crucial for many applications such as action detection or robotic interaction.
While this problem has been well-studied under the classic formulation of
spatio-temporal grouping, state-of-the-art approaches d