Online ride-hailing services have become a prevalent transportation system
across the world. In this paper, we study a challenging problem of how to
direct vacant taxis around a city such that supplies and demands can be
balanced in online ride-hailing services. We design a new reward scheme that
considers multiple performance metrics of online ride-hailing services. We also
propose a novel deep reinforcement learning method named Deep-Q-Network with
Action Mask (AM-DQN) masking off unnecessary actions in various locations such
that agents can learn much faster and more efficiently. We conduct extensive
experiments using a city-scale dataset from Chicago. Several popular heuristic
and learning methods are also implemented as baselines for comparison. The
results of the experiments show that the AM-DQN attains the best performances
of all methods with respect to average failure rate, average waiting time for
customers, and average idle search time for vacant taxis.

本论文研究了如何通过新的 reward scheme 和 deep reinforcement learning 方法 AM-DQN 来控制城市中的空车，以实现在线打车服务的供需平衡，并使用芝加哥的数据集进行了实验，结果表明 AM-DQN 相对于其他方法具有更好的性能。

城市尺度在线打车服务中基于深度强化学习的代理指导方向

Where to go: Agent Guidance with Deep Reinforcement Learning in A City-Scale Online Ride-Hailing Service

This paper presents a novel collaborative generative modeling (CGM) framework
that incentivizes collaboration among self-interested parties to contribute
data to a pool for training a generative model (e.g., GAN), from which
synthetic data are drawn and distributed to the parties as rewards commensurate
to their contributions. Distributing synthetic data as rewards (instead of
trained models or money) offers task- and model-agnostic benefits for
downstream learning tasks and is less likely to violate data privacy
regulation. To realize the framework, we firstly propose a data valuation
function using maximum mean discrepancy (MMD) that values data based on its
quantity and quality in terms of its closeness to the true data distribution
and provide theoretical results guiding the kernel choice in our MMD-based data
valuation function. Then, we formulate the reward scheme as a linear
optimization problem that when solved, guarantees certain incentives such as
fairness in the CGM framework. We devise a weighted sampling algorithm for
generating synthetic data to be distributed to each party as reward such that
the value of its data and the synthetic data combined matches its assigned
reward value by the reward scheme. We empirically show using simulated and
real-world datasets that the parties' synthetic data rewards are commensurate
to their contributions.

本文提出了一种新的协作生成建模 (CGMO) 框架，通过使用最大均值差 (MMD) 数据估价函数和线性优化问题作为奖励计划，来激励个体之间的协作，将合成数据作为奖励分配给这些个体，同时保证合理的激励机制。

通过合成数据奖励激励机器学习合作

Incentivizing Collaboration in Machine Learning via Synthetic Data  Rewards

We study the problem of balancing effectiveness and efficiency in automated
feature selection. After exploring many feature selection methods, we observe a
computational dilemma: 1) traditional feature selection is mostly efficient,
but difficult to identify the best subset; 2) the emerging reinforced feature
selection automatically navigates to the best subset, but is usually
inefficient. Can we bridge the gap between effectiveness and efficiency under
automation? Motivated by this dilemma, we aim to develop a novel feature space
navigation method. In our preliminary work, we leveraged interactive
reinforcement learning to accelerate feature selection by external
trainer-agent interaction. In this journal version, we propose a novel
interactive and closed-loop architecture to simultaneously model interactive
reinforcement learning (IRL) and decision tree feedback (DTF). Specifically,
IRL is to create an interactive feature selection loop and DTF is to feed
structured feature knowledge back to the loop. First, the tree-structured
feature hierarchy from decision tree is leveraged to improve state
representation. In particular, we represent the selected feature subset as an
undirected graph of feature-feature correlations and a directed tree of
decision features. We propose a new embedding method capable of empowering
graph convolutional network to jointly learn state representation from both the
graph and the tree. Second, the tree-structured feature hierarchy is exploited
to develop a new reward scheme. In particular, we personalize reward assignment
of agents based on decision tree feature importance. In addition, observing
agents' actions can be feedback, we devise another reward scheme, to weigh and
assign reward based on the feature selected frequency ratio in historical
action records. Finally, we present extensive experiments on real-world
datasets to show the improved performance.

我们提出了一种新的交互式和闭环架构，同时建模交互式强化学习（IRL）和决策树反馈（DTF），以在自动化过程中平衡有效性和效率的问题。通过实验，我们发现传统特征选择方法大多数是有效的，但难以识别最佳子集；而新兴的增强特征选择方法虽然可以自动导航到最佳子集，但通常效率较低。因此，我们的工作旨在开发一种新的特征空间导航方法。