In recent years, the integration of Automated Planning (AP) and Reinforcement Learning (RL) has seen a surge of interest. To perform this integration, a general framework for Sequential Decision Making (SDM) would prove immensely useful, as it would help us understand how AP and RL fit together. In this preliminary work, we attempt to provide such a framework, suitable for any method ranging from Classical Planning to Deep RL, by drawing on concepts from Probability Theory and Bayesian inference. We formulate an SDM task as a set of training and test Markov Decision Processes (MDPs), to account for generalization. We provide a general algorithm for SDM which we hypothesize every SDM method is based on. According to it, every SDM algorithm can be seen as a procedure that iteratively improves its solution estimate by leveraging the task knowledge available. Finally, we derive a set of formulas and algorithms for calculating interesting properties of SDM tasks and methods, which make possible their empirical evaluation and comparison.

近年来，自动规划（AP）和强化学习（RL）的整合引起了广泛关注。为了实现这种整合，我们试图提供一个适用于从传统规划到深度强化学习的任何方法的顺序决策制定（SDM）的通用框架，该框架借鉴了概率论和贝叶斯推断的概念。我们用训练和测试的马尔可夫决策过程（MDPs）集合来定义SDM任务，以考虑泛化性。我们提供了一个SDM的通用算法，并推测每个SDM方法都基于它。根据该算法，每个SDM算法可以被看作是通过利用可用的任务知识来迭代改进其解决方案估计的过程。最后，我们推导出一组用于计算SDM任务和方法的有趣属性的公式和算法，从而使得它们的经验评估和比较成为可能。

面向连续决策的统一框架