This work addresses learning online fair division under uncertainty, where a central planner sequentially allocates items without precise knowledge of agents' values or utilities. Departing from conventional online algorithm, the planner here relies on noisy, estimated values obtained after allocating items. We introduce wrapper algorithms utilizing \textit{dual averaging}, enabling gradual learning of both the type distribution of arriving items and agents' values through bandit feedback. This approach enables the algorithms to asymptotically achieve optimal Nash social welfare in linear Fisher markets with agents having additive utilities. We establish regret bounds in Nash social welfare and empirically validate the superior performance of our proposed algorithms across synthetic and empirical datasets.

通过使用双平均法，本研究解决了在不确定条件下学习在线公平分配的问题，其中中央规划者在不准确地了解代理方值或效用的情况下顺序分配物品。本研究提出了利用双平均法的包装算法，通过信息反馈逐步学习到到达物品的类型分布和代理方的值，从而实现了在线算法在具有加性效用的线性Fisher市场中渐进地达到最优的Nash社会福利。我们在Nash社会福利方面建立了遗憾界限，并通过合成和实证数据集实证验证了我们提出的算法的优越性能。

从强盗反馈中学习公平分配