A universal feature of human societies is the adoption of systems of rules and norms in the service of cooperative ends. How can we build learning agents that do the same, so that they may flexibly cooperate with the human institutions they are embedded in? We hypothesize that agents can achieve this by assuming there exists a shared set of norms that most others comply with while pursuing their individual desires, even if they do not know the exact content of those norms. By assuming shared norms, a newly introduced agent can infer the norms of an existing population from observations of compliance and violation. Furthermore, groups of agents can converge to a shared set of norms, even if they initially diverge in their beliefs about what the norms are. This in turn enables the stability of the normative system: since agents can bootstrap common knowledge of the norms, this leads the norms to be widely adhered to, enabling new entrants to rapidly learn those norms. We formalize this framework in the context of Markov games and demonstrate its operation in a multi-agent environment via approximately Bayesian rule induction of obligative and prohibitive norms. Using our approach, agents are able to rapidly learn and sustain a variety of cooperative institutions, including resource management norms and compensation for pro-social labor, promoting collective welfare while still allowing agents to act in their own interests.

学习智能体可以通过假设存在共享的规范来推断现有人群的规范，进而实现学习与社会合作。该研究在马尔可夫博弈的环境中形式化了这一框架，并通过近似贝叶斯规则归纳来展示了多智能体环境中的操作，使智能体能够快速学习和维持各种合作制度，包括资源管理规范和对亲社会劳动的补偿，推动集体福利同时允许智能体保持自身利益。

通过马尔可夫博弈中的贝叶斯规则归纳来学习和维持共享规范系统