TL;DR本研究提出了一种基于 PLM-GNN 的表示和分类方法,利用预训练语言模型和图神经网络对文本和 HTML DOM 树进行联合编码,有效应对网页数据增长的问题,具有较好的分类性能。
Abstract
The number of web pages is growing at an exponential rate, accumulating massive amounts of data on the web. It is one of the key processes to classify webpages in web information mining. Some classical methods are based on manually building features of web pages and training classifier