关键词process-supervised reward models
搜索结果 - 2
  • 易于困难泛化:超越人类监督的可扩展对齐
    PDF4 months ago
  • 逐步强化
    PDF8 months ago
Prev
Next