In this paper, we present a learning-based approach for multi-view stereo (MVS), i.e., estimate the depth map of a reference frame using posed multi-view images. Our core idea lies in leveraging a "learning-to-optimize" paradigm to iteratively index a plane-sweeping cost volume and reg