Facility location problems on graphs are ubiquitous in real world and hold
significant importance, yet their resolution is often impeded by NP-hardness.
Recently, machine learning methods have been proposed to tackle such classical
problems, but they are limited to the myopic constructive pattern and only
consider the problems in Euclidean space. To overcome these limitations, we
propose a general swap-based framework that addresses the p-median problem and
the facility relocation problem on graphs and a novel reinforcement learning
model demonstrating a keen awareness of complex graph structures. Striking a
harmonious balance between solution quality and running time, our method
surpasses handcrafted heuristics on intricate graph datasets. Additionally, we
introduce a graph generation process to simulate real-world urban road networks
with demand, facilitating the construction of large datasets for the classic
problem. For the initialization of the locations of facilities, we introduce a
physics-inspired strategy for the p-median problem, reaching more stable
solutions than the random strategy. The proposed pipeline coupling the classic
swap-based method with deep reinforcement learning marks a significant step
forward in addressing the practical challenges associated with facility
location on graphs.

该研究论文提出了一个基于交换的框架和一种新颖的深度强化学习模型，用于解决图上的设施位置问题和设施重新定位问题。与手工启发式方法相比，该方法在复杂图数据集上表现出更好的解决方案质量和运行时间，同时引入了一种图生成过程，以模拟带有需求的现实世界城市道路网络，为经典问题的大规模数据集构建提供了便利。这一综合了交换法和深度强化学习的方法对于解决图上设施位置问题的实际挑战具有重要意义。

基于交换的网络设施位置问题深度强化学习

Swap-based Deep Reinforcement Learning for Facility Location Problems in  Networks

Emotional support conversation (ESC) aims to provide emotional support (ES)
to improve one's mental state. Existing works stay at fitting grounded
responses and responding strategies (e.g., question), which ignore the effect
on ES and lack explicit goals to guide emotional positive transition. To this
end, we introduce a new paradigm to formalize multi-turn ESC as a process of
positive emotion elicitation. Addressing this task requires finely adjusting
the elicitation intensity in ES as the conversation progresses while
maintaining conversational goals like coherence. In this paper, we propose
Supporter, a mixture-of-expert-based reinforcement learning model, and well
design ES and dialogue coherence rewards to guide policy's learning for
responding. Experiments verify the superiority of Supporter in achieving
positive emotion elicitation during responding while maintaining conversational
goals including coherence.

本研究提出了一种新的情感支持对话范式 —— 正向情感引导，通过基于专家的混合增强学习模型，精细的情感调节以及对话连贯性的奖励设计，达到实现情感支持和维护对话连贯性的双重目标。实验结果证明了该模型在提高积极情感引导能力方面的优越性，同时也保持了对话的连贯性。