We consider the problem of online multi-agent Nash social welfare (NSW)
maximization. While previous works of Hossain et al. [2021], Jones et al.
[2023] study similar problems in stochastic multi-agent multi-armed bandits and
show that $\sqrt{T}$-regret is possible after $T$ rounds, their fairness
measure is the product of all agents' rewards, instead of their NSW (that is,
their geometric mean). Given the fundamental role of NSW in the fairness
literature, it is more than natural to ask whether no-regret fair learning with
NSW as the objective is possible. In this work, we provide a complete answer to
this question in various settings. Specifically, in stochastic $N$-agent
$K$-armed bandits, we develop an algorithm with
$\widetilde{\mathcal{O}}\left(K^{\frac{2}{N}}T^{\frac{N-1}{N}}\right)$ regret
and prove that the dependence on $T$ is tight, making it a sharp contrast to
the $\sqrt{T}$-regret bounds of Hossain et al. [2021], Jones et al. [2023]. We
then consider a more challenging version of the problem with adversarial
rewards. Somewhat surprisingly, despite NSW being a concave function, we prove
that no algorithm can achieve sublinear regret. To circumvent such negative
results, we further consider a setting with full-information feedback and
design two algorithms with $\sqrt{T}$-regret: the first one has no dependence
on $N$ at all and is applicable to not just NSW but a broad class of welfare
functions, while the second one has better dependence on $K$ and is preferable
when $N$ is small. Finally, we also show that logarithmic regret is possible
whenever there exists one agent who is indifferent about different arms.

在线多智能体 NSW（Nash 社会福利）最大化问题中，我们提出了一种完全回答 NSW 作为目标的无悔公平学习是否可能的算法，并且在不同设置下得到了相应的后悔界限。