AbstractGraphical User Interface (GUI) agents, driven by Multi-modal
Large Language Models (MLLMs), have emerged as a promising paradigm for enabling intelligent interaction with digital systems. This paper provides a structured summary of recent advances in
→