Ananth Balashankar, Xiao Ma, Aradhana Sinha, Ahmad Beirami, Yao Qin...
TL;DR领域通用的少样本学习方法进行调优和数据增强,相较于传统方法,在社交化化学道德判断和毒性检测任务中提高了7-17%的 F1 分数和9-13%的 AUC。
Abstract
As large language models (LLMs) are widely adopted, new safety issues and policies emerge, to which existing safety classifiers do not generalize well. If we have only observed a few examples of violations of a n