visual affordance learning is a key component for robots to understand how to
interact with objects. Conventional approaches in this field rely on
pre-defined objects and actions, falling short of capturing diverse
interactions in realworld scenarios. The key idea of our approach is em