Reward allocation, also known as the credit assignment problem, has been an important topic in economics, engineering, and machine learning. An important concept in credit assignment is the core, which is the set of stable allocations where no agent has the motivation to deviate from the grand coalition. In this paper, we consider the stable allocation learning problem of stochastic cooperative games, where the reward function is characterised as a random variable with an unknown distribution. Given an oracle that returns a stochastic reward for an enquired coalition each round, our goal is to learn the expected core, that is, the set of allocations that are stable in expectation. Within the class of strictly convex games, we present an algorithm named \texttt{Common-Points-Picking} that returns a stable allocation given a polynomial number of samples, with high probability. The analysis of our algorithm involves the development of several new results in convex geometry, including an extension of the separation hyperplane theorem for multiple convex sets, and may be of independent interest.

在本文中，我们考虑了随机合作博弈的稳定分配学习问题，在这个问题中，奖励函数被描述为具有未知分布的随机变量。我们提出了一个名为“Common-Points-Picking”的算法，它在多项式数量的样本下，以很高的概率返回一个稳定的分配。我们的算法分析涉及了凸几何学中的一些新结果，包括多个凸集合分离超平面定理的扩展，可能具有独立的研究价值。

学习严格凸形随机合作博弈的预期核心