In this paper, a sparse Markov decision process (MDP) with novel causal
sparse Tsallis entropy regularization is proposed.The proposed policy
regularization induces a sparse and multi-modal optimal policy distribution of
a sparse MDP. The full mathematical analysis of the proposed sparse MDP is
provided.We first analyze the optimality condition of a sparse MDP. Then, we
propose a sparse value iteration method which solves a sparse MDP and then
prove the convergence and optimality of sparse value iteration using the Banach
fixed point theorem. The proposed sparse MDP is compared to soft MDPs which
utilize causal entropy regularization. We show that the performance error of a
sparse MDP has a constant bound, while the error of a soft MDP increases
logarithmically with respect to the number of actions, where this performance
error is caused by the introduced regularization term. In experiments, we apply
sparse MDPs to reinforcement learning problems. The proposed method outperforms
existing methods in terms of the convergence speed and performance.

本文提出了一种带有因果稀疏 Tsallis 熵正则化的稀疏 Markov 决策过程，引入的策略正则化引导了 Markov 决策过程中的稀疏和多模态最优策略分布，并与利用因果熵正则化的软 Markov 决策过程进行了比较，在强化学习问题中应用稀疏 MDP 方法，优于现有方法在收敛速度和性能方面。