This paper addresses the problem of predicting popularity of comments in an online discussion forum using reinforcement learning, particularly addressing two challenges that arise from having natural language state and action spaces. First, the state representation, which characterizes the history of comments tracked in a discussion at a particular point, is augmented to incorporate the global context represented by discussions on world events available in an external knowledge source. Second, a two-stage Q-learning framework is introduced, making it feasible to search the combinatorial action space while also accounting for redundancy among sub-actions. We experiment with five Reddit communities, showing that the two methods improve over previous reported results on this task.

本文探讨使用强化学习来预测在线论坛评论的受欢迎程度的问题，尤其是针对自然语言状态和动作空间所带来的两个挑战。作者提出了一种增强状态表示方法来融合外部知识源的全局上下文，同时引入了一个两阶段的Q-learning框架来解决组合动作空间搜索和子动作冗余问题，并在五个Reddit社区进行实验，证明了这两种方法在此任务上比之前的方法效果更好。

利用外部知识和两阶段Q函数进行强化学习，预测Reddit热门讨论