Developing robots capable of executing various manipulation tasks, guided by
natural language instructions and visual observations of intricate real-world
environments, remains a significant challenge in robotics. Such robot agents
need to understand linguistic commands and distinguish