Humans inherently recognize objects via selective visual perception,
transform specific regions from the visual field into structured symbolic
knowledge, and reason their relationships among regions based on the allocation
of limited attention resources in line with humans' goals. Whil