Learning to manipulate 3D objects in an interactive environment has been a challenging problem in reinforcement learning (RL). In particular, it is hard to train a policy that can generalize over objects with different semantic categories, diverse shape geometry and versatile functiona