Large language models (LLMs) that have been trained on a corpus that includes large amount of code exhibit a remarkable ability to understand HTML code. As web interfaces are primarily constructed using HTML, we design an in-depth study to see how LLMs can be used to retrieve and locate important elements for a user given query (i.e. task description) in a web interface. In contrast with prior works, which primarily focused on autonomous web navigation, we decompose the problem as an even atomic operation - Can LLMs identify the important information in the web page for a user given query? This decomposition enables us to scrutinize the current capabilities of LLMs and uncover the opportunities and challenges they present. Our empirical experiments show that while LLMs exhibit a reasonable level of performance in retrieving important UI elements, there is still a substantial room for improvement. We hope our investigation will inspire follow-up works in overcoming the current challenges in this domain.

大型语言模型（LLM）在训练时使用了大量的代码，表现出了对HTML代码的卓越理解能力。因为Web界面主要使用HTML构建，所以我们进行了深入研究，探讨LLM如何用于在Web界面中检索和定位用户查询（即任务描述）的重要元素。与以往主要侧重于自主Web导航的研究不同，我们将问题分解为一个更为原子化的操作-LLM能否识别Web页面中对用户查询重要的信息？这种分解使我们能够审视LLM的当前能力并揭示其所带来的机遇和挑战。我们的实证实验证明，虽然LLM在检索重要的用户界面元素方面表现出了合理的性能水平，但仍然有很大的改进空间。我们希望我们的研究能够激发在这一领域克服当前挑战的后续工作。

使用LLMs从Web界面检索信息的机会与挑战