Ad hoc teamwork refers to the problem of enabling an agent to collaborate
with teammates without prior coordination. Data-driven methods represent the
state of the art in ad hoc teamwork. They use a large labeled dataset of prior
observations to model the behavior of other agent types and to determine the ad
hoc agent's behavior. These methods are computationally expensive, lack
transparency, and make it difficult to adapt to previously unseen changes,
e.g., in team composition. Our recent work introduced an architecture that
determined an ad hoc agent's behavior based on non-monotonic logical reasoning
with prior commonsense domain knowledge and predictive models of other agents'
behavior that were learned from limited examples. In this paper, we
substantially expand the architecture's capabilities to support: (a) online
selection, adaptation, and learning of the models that predict the other
agents' behavior; and (b) collaboration with teammates in the presence of
partial observability and limited communication. We illustrate and
experimentally evaluate the capabilities of our architecture in two simulated
multiagent benchmark domains for ad hoc teamwork: Fort Attack and Half Field
Offense. We show that the performance of our architecture is comparable or
better than state of the art data-driven baselines in both simple and complex
scenarios, particularly in the presence of limited training data, partial
observability, and changes in team composition.

采用基于非单调逻辑推理以及少量有限数据所学的其他代理行为预测模型的架构，通过支持在线选择、适应和学习模型以及在部分可观测性和有限通信存在下与队友协作来解决无先协调时代理与队友协作问题，实验证明该模型的性能在简单和复杂情况下都优于或与最先进的数据驱动基线相当，特别是在有限的训练数据、部分可观测性和团队组成的变化存在下。