BriefGPT.xyz
Dec, 2022
DeepJoin:预训练语言模型支持的可连接表格发现
DeepJoin: Joinable Table Discovery with Pre-trained Language Models
HTML
PDF
Yuyang Dong, Chuan Xiao, Takuma Nozawa, Masafumi Enomoto, Masafumi Oyamada
TL;DR
在数据湖管理中,使用Deepjoin进行精确和高效的可连接表发现,它是一种基于深度学习模型的嵌入式检索解决方案,可服务于等价连接和语义连接,训练数据和数据增强技术的设计有助于其在大数据集上泛化,其精度甚至优于专家标记的语义连接的精确解,并配备GPU时速度可提高两个数量级。
Abstract
Due to the usefulness in data enrichment for data analysis tasks,
joinable table discovery
has become an important operation in data lake management. Existing approaches target
equi-joins
, the most common way of
→