To understand a scene in depth not only involves locating/recognizing
individual objects, but also requires to infer the relationships and
interactions among them. However, since the distribution of real-world
relationships is seriously unbalanced, existing methods perform quite poorly
for the less frequent relationships. In this work, we find that the stati