May, 2024
学习澄清:基于行动对比自我训练的多轮对话
Learning to Clarify: Multi-turn Conversations with Action-Based Contrastive Self-Training
Maximillian Chen, Ruoxi Sun, Sercan Ö. Arık, Tomas Pfister
TL;DRAction-Based Contrastive Self-Training (ACT) is a quasi-online preference optimization algorithm that improves conversation modeling in large language models (LLMs), particularly in the area of disambiguation and dialogue policy learning.