In this paper, we consider the problem of planning and learning in the infinite-horizon discounted-reward Markov decision problems. We propose a novel iterative direct policy-search approach, called dynamic policy programming (DPP). DPP is, to the best of our knowledge, the first conve