Most progress in recent coder models has been driven by supervised fine-tuning (SFT), while the potential of Reinforcement Learning (RL) remains largely unexplored, primarily due to the lack of reliable reward data/model in the code domain. In this paper, we address this challenge by l