BriefGPT.xyz
Ask
alpha
关键词
dimension decomposition
搜索结果 - 1
个性化汤:通过事后参数合并实现个性化大型语言模型对齐
通过将 Reinforcement Learning from Human Feedback (RLHF) 转变为 Reinforcement Learning from Personalized Human Feedback (RLPHF
→
PDF
9 months ago
Prev
Next