The standard problem setting in dec-pomdps is self-play, where the goal is to
find a set of policies that play optimally together. Policies learned through
self-play may adopt arbitrary conventions and implicitly rely on multi-step
reasoning based on fragile assumptions about other age