Code Community Question Answering (CCQA) seeks to tackle programming-related
issues, thereby boosting productivity in both software engineering and academic
research. Recent advancements in Reinforcement Learning from Human Feedback
(RLHF) have transformed the fine-tuning process of Large Language Models (LLMs)
to produce responses that closely mimic human behavior. Leveraging LLMs with
RLHF for practical CCQA applications has thus emerged as a promising area of
study. Unlike standard code question-answering tasks, CCQA involves multiple
possible answers, with varying user preferences for each response.
Additionally, code communities often show a preference for new APIs. These
challenges prevent LLMs from generating responses that cater to the diverse
preferences of users in CCQA tasks. To address these issues, we propose a novel
framework called Aligning LLMs through Multi-perspective User Preference
Ranking-based Feedback for Programming Question Answering (ALMupQA) to create
user-focused responses. Our approach starts with Multi-perspective Preference
Ranking Alignment (MPRA), which synthesizes varied user preferences based on
the characteristics of answers from code communities. We then introduce a
Retrieval-augmented In-context Learning (RIL) module to mitigate the problem of
outdated answers by retrieving responses to similar questions from a question
bank. Due to the limited availability of high-quality, multi-answer CCQA
datasets, we also developed a dataset named StaCCQA from real code communities.
Extensive experiments demonstrated the effectiveness of the ALMupQA framework
in terms of accuracy and user preference. Compared to the base model, ALMupQA
showed nearly an 11% improvement in BLEU, with increases of 20% and 17.5% in
BERTScore and CodeBERTScore, respectively.

利用人类反馈强化学习从大型语言模型出发，以解决编码社区问答中多个答案和用户偏好差异的问题，提出了一种名为 ALMupQA 的框架，通过多角度用户偏好排序反馈来生成面向用户的答案。实验证明，ALMupQA 相比基础模型在 BLEU 指标上提升了近 11%，BERTScore 和 CodeBERTScore 分别提高了 20% 和 17.5%。