This paper presents a theoretical analysis of multi-view embedding -- feature embedding that can be learned from unlabeled data through the task of predicting one view from another. We prove its usefulness in supervised learning under certain conditions. The result explains the effectiveness of some existing methods such as word embedding. Based on this theo