Recent advancements in large language models have heavily relied on the large reward model from Reinforcement Learning from human feedback for fine-tuning. However, the use of a single reward model across various domains may not always be optimal, often requiring retraining from scratc