In X-ray Computed Tomography (CT), projections from many angles are acquired
and used for 3D reconstruction. To make CT suitable for in-line quality
control, reducing the number of angles while maintaining reconstruction quality
is necessary. Sparse-angle tomography is a popular approach for obtaining 3D
reconstructions from limited data. To optimize its performance, one can adapt
scan angles sequentially to select the most informative angles for each scanned
object. Mathematically, this corresponds to solving and optimal experimental
design (OED) problem. OED problems are high-dimensional, non-convex, bi-level
optimization problems that cannot be solved online, i.e., during the scan. To
address these challenges, we pose the OED problem as a partially observable
Markov decision process in a Bayesian framework, and solve it through deep
reinforcement learning. The approach learns efficient non-greedy policies to
solve a given class of OED problems through extensive offline training rather
than solving a given OED problem directly via numerical optimization. As such,
the trained policy can successfully find the most informative scan angles
online. We use a policy training method based on the Actor-Critic approach and
evaluate its performance on 2D tomography with synthetic data.

采用深度强化学习方法解决了优化实验设计问题，以选择最具信息量的扫描角度，从而在 CT 中实现少角度成像。

使用深度强化学习进行 X 射线 CT 的序列实验设计

Sequential Experimental Design for X-Ray CT Using Deep Reinforcement  Learning

Knowledge distillation addresses the problem of transferring knowledge from a
teacher model to a student model. In this process, we typically have multiple
types of knowledge extracted from the teacher model. The problem is to make
full use of them to train the student model. Our preliminary study shows that:
(1) not all of the knowledge is necessary for learning a good student model,
and (2) knowledge distillation can benefit from certain knowledge at different
training steps. In response to these, we propose an actor-critic approach to
selecting appropriate knowledge to transfer during the process of knowledge
distillation. In addition, we offer a refinement of the training algorithm to
ease the computational burden. Experimental results on the GLUE datasets show
that our method outperforms several strong knowledge distillation baselines
significantly.

本文提出了一种基于演员 - 评论家方法的知识蒸馏框架，旨在从教师模型中选择适当的知识来训练学生模型，实验结果表明该方法在 GLUE 数据集上优于常规基线模型。

通过知识选择改进预训练语言模型的知识蒸馏

Improved Knowledge Distillation for Pre-trained Language Models via Knowledge Selection

One of the challenges for multi-agent reinforcement learning (MARL) is
designing efficient learning algorithms for a large system in which each agent
has only limited or partial information of the entire system. While exciting
progress has been made to analyze decentralized MARL with the network of agents
for social networks and team video games, little is known theoretically for
decentralized MARL with the network of states for modeling self-driving
vehicles, ride-sharing, and data and traffic routing.
This paper proposes a framework of localized training and decentralized
execution to study MARL with network of states. Localized training means that
agents only need to collect local information in their neighboring states
during the training phase; decentralized execution implies that agents can
execute afterwards the learned decentralized policies, which depend only on
agents' current states.
The theoretical analysis consists of three key components: the first is the
reformulation of the MARL system as a networked Markov decision process with
teams of agents, enabling updating the associated team Q-function in a
localized fashion; the second is the Bellman equation for the value function
and the appropriate Q-function on the probability measure space; and the third
is the exponential decay property of the team Q-function, facilitating its
approximation with efficient sample efficiency and controllable error.
The theoretical analysis paves the way for a new algorithm LTDE-Neural-AC,
where the actor-critic approach with over-parameterized neural networks is
proposed. The convergence and sample complexity is established and shown to be
scalable with respect to the sizes of both agents and states. To the best of
our knowledge, this is the first neural network based MARL algorithm with
network structure and provably convergence guarantee.

提出了基于 LTDE-Neural-AC 和演员 - 评论家方法的多智能体强化学习算法，应用于自驾车、拼车、数据和交通路由模型的图网络，其解决了分散式多智能体强化学习网络结构的问题，并具有收敛保证的优势。

均场多智体强化学习：一种分散网络方法

Mean-Field Multi-Agent Reinforcement Learning: A Decentralized Network  Approach

In this paper, we investigate how to learn to control a group of cooperative
agents with limited sensing capabilities such as robot swarms. The agents have
only very basic sensor capabilities, yet in a group they can accomplish
sophisticated tasks, such as distributed assembly or search and rescue tasks.
Learning a policy for a group of agents is difficult due to distributed partial
observability of the state. Here, we follow a guided approach where a critic
has central access to the global state during learning, which simplifies the
policy evaluation problem from a reinforcement learning point of view. For
example, we can get the positions of all robots of the swarm using a camera
image of a scene. This camera image is only available to the critic and not to
the control policies of the robots. We follow an actor-critic approach, where
the actors base their decisions only on locally sensed information. In
contrast, the critic is learned based on the true global state. Our algorithm
uses deep reinforcement learning to approximate both the Q-function and the
policy. The performance of the algorithm is evaluated on two tasks with simple
simulated 2D agents: 1) finding and maintaining a certain distance to each
others and 2) locating a target.

本文研究了如何使用有限的传感能力控制一组合作智能体，使用了基于演员 - 评论家算法的深度强化学习以近似 Q 值函数和策略评估，评估了在寻找和维持距离和定位目标方面的性能。