To overcome the sim-to-real gap in reinforcement learning (RL), learned
policies must maintain robustness against environmental uncertainties. While
robust RL has been widely studied in single-agent regimes, in multi-agent
environments, the problem remains understudied -- despite the fact that the
problems posed by environmental uncertainties are often exacerbated by
strategic interactions. This work focuses on learning in distributionally
robust Markov games (RMGs), a robust variant of standard Markov games, wherein
each agent aims to learn a policy that maximizes its own worst-case performance
when the deployed environment deviates within its own prescribed uncertainty
set. This results in a set of robust equilibrium strategies for all agents that
align with classic notions of game-theoretic equilibria. Assuming a
non-adaptive sampling mechanism from a generative model, we propose a
sample-efficient model-based algorithm (DRNVI) with finite-sample complexity
guarantees for learning robust variants of various notions of game-theoretic
equilibria. We also establish an information-theoretic lower bound for solving
RMGs, which confirms the near-optimal sample complexity of DRNVI with respect
to problem-dependent factors such as the size of the state space, the target
accuracy, and the horizon length.

为了解决强化学习中的模拟到实际之间的差距，学习策略必须对环境不确定性保持鲁棒性。本研究着重于多智能体环境中学习分布鲁棒马尔可夫博弈，提出基于模型的 DRNVI 算法来学习各种博弈论平衡的鲁棒变种，同时建立了信息论下界以确认 DRNVI 的近乎最优样本复杂度。

面对环境不确定性的高样本效率鲁棒多智能体强化学习

Sample-Efficient Robust Multi-Agent Reinforcement Learning in the Face  of Environmental Uncertainty

Automated driving object detection has always been a challenging task in
computer vision due to environmental uncertainties. These uncertainties include
significant differences in object sizes and encountering the class unseen. It
may result in poor performance when traditional object detection models are
directly applied to automated driving detection. Because they usually presume
fixed categories of common traffic participants, such as pedestrians and cars.
Worsely, the huge class imbalance between common and novel classes further
exacerbates performance degradation. To address the issues stated, we propose
OpenNet to moderate the class imbalance with the Balanced Loss, which is based
on Cross Entropy Loss. Besides, we adopt an inductive layer based on gradient
reshaping to fast learn new classes with limited samples during incremental
learning. To against catastrophic forgetting, we employ normalized feature
distillation. By the way, we improve multi-scale detection robustness and
unknown class recognition through FPN and energy-based detection, respectively.
The Experimental results upon the CODA dataset show that the proposed method
can obtain better performance than that of the existing methods.

提出了一种用于自动驾驶目标检测的方法，通过平衡损失来缓解类别不平衡，采用梯度重塑的归纳层快速学习有限样本的新类别，通过归一化特征蒸馏来防止灾难性遗忘，并通过 FPN 和基于能量的检测提高多尺度检测稳健性和未知类别识别性能，实验证明该方法在 CODA 数据集上表现出更好的性能。

OpenNet: 自动驾驶目标检测的增量学习与平衡损失

OpenNet: Incremental Learning for Autonomous Driving Object Detection  with Balanced Loss

Multi-agent reinforcement learning (MARL) plays a pivotal role in tackling
real-world challenges. However, the seamless transition of trained policies
from simulations to real-world requires it to be robust to various
environmental uncertainties. Existing works focus on finding Nash Equilibrium
or the optimal policy under uncertainty in one environment variable (i.e.
action, state or reward). This is because a multi-agent system itself is highly
complex and unstationary. However, in real-world situation uncertainty can
occur in multiple environment variables simultaneously. This work is the first
to formulate the generalised problem of robustness to multi-modal environment
uncertainty in MARL. To this end, we propose a general robust training approach
for multi-modal uncertainty based on curriculum learning techniques. We handle
two distinct environmental uncertainty simultaneously and present extensive
results across both cooperative and competitive MARL environments,
demonstrating that our approach achieves state-of-the-art levels of robustness.

该研究是第一个对多模态环境不确定性的多智能体强化学习问题进行广义建模的工作，并提出了基于课程学习技术的多模态不确定性的鲁棒训练方法，通过在合作和竞争的多智能体强化学习环境下的广泛实验结果表明我们的方法达到了最先进的鲁棒性水平。