Multi-task reinforcement learning endeavors to accomplish a set of different
tasks with a single policy. To enhance data efficiency by sharing parameters
across multiple tasks, a common practice segments the network into distinct
modules and trains a routing network to recombine these modules into
task-specific policies. However, existing routing approaches employ a fixed
number of modules for all tasks, neglecting that tasks with varying
difficulties commonly require varying amounts of knowledge. This work presents
a Dynamic Depth Routing (D2R) framework, which learns strategic skipping of
certain intermediate modules, thereby flexibly choosing different numbers of
modules for each task. Under this framework, we further introduce a ResRouting
method to address the issue of disparate routing paths between behavior and
target policies during off-policy training. In addition, we design an automatic
route-balancing mechanism to encourage continued routing exploration for
unmastered tasks without disturbing the routing of mastered ones. We conduct
extensive experiments on various robotics manipulation tasks in the Meta-World
benchmark, where D2R achieves state-of-the-art performance with significantly
improved learning efficiency.

该研究通过动态深度路由（D2R）框架实现多任务强化学习，其中通过绕过中间模块灵活选择不同数量的模块来提高数据效率并解决不同策略的路由路径问题。该框架进一步引入 ResRouting 方法解决行为策略和目标策略在离策略训练过程中的差异路由路径问题，并设计了自动的路由平衡机制来促进未掌握任务的继续路由探索。在 Meta-World 基准测试中，通过该框架在各种机器人操作任务上进行了广泛实验，取得了具有显著提高的学习效率的最新成果。

不是所有任务都一样困难：具有动态深度路由的多任务强化学习

Not All Tasks Are Equally Difficult: Multi-Task Reinforcement Learning  with Dynamic Depth Routing

Sparsely Mixture of Experts (MoE) has received great interest due to its
promising scaling capability with affordable computational overhead. MoE
converts dense layers into sparse experts, and utilizes a gated routing network
to make experts conditionally activated. However, as the number of experts
grows, MoE with outrageous parameters suffers from overfitting and sparse data
allocation. Such problems are especially severe on tasks with limited data,
thus hindering the progress for MoE models to improve performance by scaling
up. In this work, we propose Mixture of Expert Clusters - a general approach to
enable expert layers to learn more diverse and appropriate knowledge by
imposing variance-based constraints on the routing stage. We further propose a
cluster-level expert dropout strategy specifically designed for the expert
cluster structure. Our experiments reveal that MoEC could improve performance
on machine translation and natural language understanding tasks, and raise the
performance upper bound for scaling up experts under limited data. We also
verify that MoEC plays a positive role in mitigating overfitting and sparse
data allocation.

本研究提出了 Mixture of Expert Clusters 模型，通过在路由阶段引入基于方差的约束来促进专家层学习更多不同和适当的知识，并提出了一种专家集群结构的集群级别专家丢失策略。实验证明，该模型可以提高机器翻译和自然语言理解任务的性能，并在有限数据条件下扩展专家的性能上限，对缓解过度拟合和稀疏数据分配问题起到积极作用。

MoEC: 专家混合聚类

MoEC: Mixture of Expert Clusters

Multi-task learning is a very challenging problem in reinforcement learning.
While training multiple tasks jointly allow the policies to share parameters
across different tasks, the optimization problem becomes non-trivial: It
remains unclear what parameters in the network should be reused across tasks,
and how the gradients from different tasks may interfere with each other. Thus,
instead of naively sharing parameters across tasks, we introduce an explicit
modularization technique on policy representation to alleviate this
optimization issue. Given a base policy network, we design a routing network
which estimates different routing strategies to reconfigure the base network
for each task. Instead of directly selecting routes for each task, our
task-specific policy uses a method called soft modularization to softly combine
all the possible routes, which makes it suitable for sequential tasks. We
experiment with various robotics manipulation tasks in simulation and show our
method improves both sample efficiency and performance over strong baselines by
a large margin.

通过引入显式的模块化技术和路由网络，将多任务共享的参数进行重新配置，实现了适用于连续任务的软模块化方法，从而大幅提高了机器人操作任务的效率和性能。

软模块化的多任务强化学习

Multi-Task Reinforcement Learning with Soft Modularization

Multi-task learning (MTL) with neural networks leverages commonalities in
tasks to improve performance, but often suffers from task interference which
reduces the benefits of transfer. To address this issue we introduce the
routing network paradigm, a novel neural network and training algorithm. A
routing network is a kind of self-organizing neural network consisting of two
components: a router and a set of one or more function blocks. A function block
may be any neural network - for example a fully-connected or a convolutional
layer. Given an input the router makes a routing decision, choosing a function
block to apply and passing the output back to the router recursively,
terminating when a fixed recursion depth is reached. In this way the routing
network dynamically composes different function blocks for each input. We
employ a collaborative multi-agent reinforcement learning (MARL) approach to
jointly train the router and function blocks. We evaluate our model against
cross-stitch networks and shared-layer baselines on multi-task settings of the
MNIST, mini-imagenet, and CIFAR-100 datasets. Our experiments demonstrate a
significant improvement in accuracy, with sharper convergence. In addition,
routing networks have nearly constant per-task training cost while cross-stitch
networks scale linearly with the number of tasks. On CIFAR-100 (20 tasks) we
obtain cross-stitch performance levels with an 85% reduction in training time.

本文提出了一种新的神经网络和训练算法 —— 路由网络（routing network），通过协同多智能体强化学习（collaborative multi-agent reinforcement learning）协同训练路由器（router）和功能块（function blocks），使得路由网络能够动态地组合不同的功能块以适应输入，从而在解决多任务学习（Multi-task learning）时大幅提升了准确性和收敛速度。