We study dynamic discrete choice models, where a commonly studied problem
involves estimating parameters of agent reward functions (also known as
"structural" parameters), using agent behavioral data. Maximum likelihood
estimation for such models requires dynamic programming, which is
The paper introduces a reinforcement learning-based method for model selection in the presence of changing circumstances, particularly in the context of portfolio management with rebalancing costs, demonstrating superior performance compared to hindsight-based model selection.
本文提出了一种基于深度强化学习的解决背包问题的方法,该方法采用状态聚合策略和 Advantage Actor Critic 算法处理背包问题的每个问题实例,在每个时间步骤逐个选择物品,重复选择直到得到最终解决方案,实验表明该方法能够提供接近于最优解的解决方案,且优于贪心算法,而且处理规模更大和更灵活的问题。此外,本文提出的模型不仅提供更好的解决方案,而且在更少的时间步骤内学习,有着良好的表现。