Information can be expressed in multiple formats including natural language, images, and motions. Human intelligence usually faces little difficulty to convert from one format to another format, which often shows a true understanding of encoded information. Moreover, such conversions have broad application in many real-world applications. In this paper, we p