关键词post-training large language models
搜索结果 - 1
  • 直接纳什优化:通过一般偏好教导语言模型自我改进
    PDF3 months ago
Prev
Next