在线学习的重复囚徒困境模拟人类行为

Jun, 2020

在线学习的重复囚徒困境模拟人类行为

Online Learning in Iterated Prisoner's Dilemma to Mimic Human Behavior

Baihan Lin, Djallel Bouneffouf, Guillermo Cecchi

TL;DR本文研究了在线学习算法在囚徒困境游戏中的行为，探究了多臂老虎机、上下文老虎机和强化学习等算法在这种情景下的能力及其对人类行为的拟合度，并从多智能体竞争和策略动态方面得出了许多结论。

Abstract

Prisoner's Dilemma mainly treat the choice to cooperate or defect as an atomic action. We propose to study online learning algorithm behavior in the iterated prisoner's dilemma (IPD) game, where we explored the full spectrum of →