Optimal decision-making presents a significant challenge for autonomous systems operating in uncertain, stochastic and time-varying environments. Environmental variability over time can significantly impact the system's optimal decision making strategy for mission completion. To model such environments, our work combines the previous notion of Time-Varying Markov Decision Processes (TVMDP) with partial observability and introduces Time-Varying Partially Observable Markov Decision Processes (TV-POMDP). We propose a two-pronged approach to accurately estimate and plan within the TV-POMDP: 1) Memory Prioritized State Estimation (MPSE), which leverages weighted memory to provide more accurate time-varying transition estimates; and 2) an MPSE-integrated planning strategy that optimizes long-term rewards while accounting for temporal constraint. We validate the proposed framework and algorithms using simulations and hardware, with robots exploring a partially observable, time-varying environments. Our results demonstrate superior performance over standard methods, highlighting the framework's effectiveness in stochastic, uncertain, time-varying domains.

在不确定、随机和时变环境中，自主系统的最优决策面临重大挑战。本研究通过将时变Markov决策过程（TVMDP）的概念与部分可观测性相结合，提出了时变部分可观测Markov决策过程（TV-POMDP）来建模此类环境，并通过模拟和实际硬件验证，证明该框架在随机、不确定和时变领域中具有卓越的性能。

不断变化的难以预测环境中的学习与规划