BriefGPT.xyz
May, 2024
RLSF: 强化学习来自符号反馈
RLSF: Reinforcement Learning via Symbolic Feedback
HTML
PDF
Piyush Jha, Prithwish Jana, Arnav Arora, Vijay Ganesh
TL;DR
我们提出了一种名为符号反馈强化学习(RLSF)的新型训练/微调范式,旨在增强LLMs的推理能力,并通过使用证明等符号工具来提供精确的奖励信号,从而从传统方法中克服了局限性。
Abstract
In recent years,
large language models
(LLMs) have had a dramatic impact on various sub-fields of AI, most notably on natural language understanding tasks. However, there is widespread agreement that the
logical reasoni
→