In this work, we address the problem of learning provably stable neural
network policies for stochastic control systems. While recent work has
demonstrated the feasibility of certifying given policies using martingale
theory, the problem of how to learn such policies is little explored