蒸馏模型中的对比学习

Jan, 2024

Contrastive Learning in Distilled Models

Valerie Lim, Kai Wen Ng, Kenneth Lim

TL;DR使用SimCSE论文中的适用对比学习方法，将基于知识蒸馏模型DistilBERT的模型架构进行调整，以解决自然语言处理模型在语义文本相似度上效果不佳且过大无法部署为轻量级边缘应用的问题，最终得到的轻量级模型DistilFace在STS任务的Spearmans相关性上达到了72.1，相比BERT Base提升了34.2％。

Abstract

natural language processing models like bert can provide state-of-the-art word embeddings for downstream NLP tasks. However, these models yet to perform well on →