We study fair multi-objective reinforcement learning in which an agent must
learn a policy that simultaneously achieves high reward on multiple dimensions
of a vector-valued reward. Motivated by the fair resource allocation
literature, we model this as an expected welfare maximization problem, for some
non-linear fair welfare function of the vector of long-term cumulative rewards.
One canonical example of such a function is the Nash Social Welfare, or
geometric mean, the log transform of which is also known as the Proportional
Fairness objective. We show that even approximately optimal optimization of the
expected Nash Social Welfare is computationally intractable even in the tabular
case. Nevertheless, we provide a novel adaptation of Q-learning that combines
non-linear scalarized learning updates and non-stationary action selection to
learn effective policies for optimizing nonlinear welfare functions. We show
that our algorithm is provably convergent, and we demonstrate experimentally
that our approach outperforms techniques based on linear scalarization,
mixtures of optimal linear scalarizations, or stationary action selection for
the Nash Social Welfare Objective.

本研究探讨了如何在多个目标之间实现公平的多目标强化学习，其中一个代理必须学习一种同时在矢量价值回报的多个维度上获得高回报的策略。我们采用期望福利最大化方法，通过某些非线性公平福利函数对长期累积回报的矢量进行建模。我们提供了 Q-learning 的新颖自适应方法，以学习为非线性福利函数进行优化。我们的算法可以被证明收敛，并且实验表明与线性标量化、最佳线性标量化混合或固定行动选择技术相比，在 Nash 社会福利目标方面，我们的方法表现出更好的效果。

多目标强化学习中的福利与公正

Welfare and Fairness in Multi-objective Reinforcement Learning

We provide a reduction from revenue maximization to welfare maximization in
multi-dimensional Bayesian auctions with arbitrary (possibly combinatorial)
feasibility constraints and independent bidders with arbitrary (possibly
combinatorial) demand constraints, appropriately extending Myerson's result to
this setting. We also show that every feasible Bayesian auction can be
implemented as a distribution over virtual VCG allocation rules. A virtual VCG
allocation rule has the following simple form: Every bidder's type t_i is
transformed into a virtual type f_i(t_i), via a bidder-specific function. Then,
the allocation maximizing virtual welfare is chosen. Using this
characterization, we show how to find and run the revenue-optimal auction given
only black box access to an implementation of the VCG allocation rule. We
generalize this result to arbitrarily correlated bidders, introducing the
notion of a second-order VCG allocation rule.
We obtain our reduction from revenue to welfare optimization via two
algorithmic results on reduced forms in settings with arbitrary feasibility and
demand constraints. First, we provide a separation oracle for determining
feasibility of a reduced form. Second, we provide a geometric algorithm to
decompose any feasible reduced form into a distribution over virtual VCG
allocation rules. In addition, we show how to execute both algorithms given
only black box access to an implementation of the VCG allocation rule.
Our results are computationally efficient for all multi-dimensional settings
where the bidders are additive. In this case, our mechanisms run in time
polynomial in the total number of bidder types, but not type profiles. For
generic correlated distributions, this is the natural description complexity of
the problem. The runtime can be further improved to poly(#items, #bidders) in
item-symmetric settings by making use of recent techniques.

本文提供了一个从收入最大化到福利最大化的规约，以在具有任意（可能是组合）可行性约束和具有任意（可能是组合）需求约束的多维贝叶斯拍卖中，恰当地将 Myerson 的结果扩展到此设置。我们还展示了每个可行的贝叶斯拍卖都可以实现为虚拟 VCG 分配规则的分布。利用这种表征，我们展示了如何找到并运行仅具有黑箱访问虚拟 VCG 分配规则实现的收入最优拍卖。

多维机制优化设计：将收益降至最小以实现福利最大化

Optimal Multi-Dimensional Mechanism Design: Reducing Revenue to Welfare  Maximization

Complements between goods - where one good takes on added value in the
presence of another - have been a thorn in the side of algorithmic mechanism
designers. On the one hand, complements are common in the standard motivating
applications for combinatorial auctions, like spectrum license auctions. On the
other, welfare maximization in the presence of complements is notoriously
difficult, and this intractability has stymied theoretical progress in the
area. For example, there are no known positive results for combinatorial
auctions in which bidder valuations are multi-parameter and
non-complement-free, other than the relatively weak results known for general
valuations.
To make inroads on the problem of combinatorial auction design in the
presence of complements, we propose a model for valuations with complements
that is parameterized by the "size" of the complements. A valuation in our
model is represented succinctly by a weighted hypergraph, where the size of the
hyper-edges corresponds to degree of complementarity. Our model permits a
variety of computationally efficient queries, and non-trivial
welfare-maximization algorithms and mechanisms.
We design the following polynomial-time approximation algorithms and truthful
mechanisms for welfare maximization with bidders with hypergraph valuations.
1- For bidders whose valuations correspond to subgraphs of a known graph that
is planar (or more generally, excludes a fixed minor), we give a truthful and
(1+epsilon)-approximate mechanism.
2- We give a polynomial-time, r-approximation algorithm for welfare
maximization with hypergraph-r valuations. Our algorithm randomly rounds a
compact linear programming relaxation of the problem.
3- We design a different approximation algorithm and use it to give a
polynomial-time, truthful-in-expectation mechanism that has an approximation
factor of O(log^r m).

研究中提出了一种评估具有互补性的商品的模型，该模型的参数化方法取决于互补性的 “大小”，并为各种计算机查询、效用最大化算法和机制提供了一种选择。