May, 2024

学习澄清:基于行动对比自我训练的多轮对话

TL;DRAction-Based Contrastive Self-Training (ACT) is a quasi-online preference optimization algorithm that improves conversation modeling in large language models (LLMs), particularly in the area of disambiguation and dialogue policy learning.