Online tuning of real-world plants is a complex optimisation problem that
continues to require manual intervention by experienced human operators.
Autonomous tuning is a rapidly expanding field of research, where
learning-based methods, such as Reinforcement Learning-trained Optimisation
(RLO) and Bayesian optimisation (BO), hold great promise for achieving
outstanding plant performance and reducing tuning times. Which algorithm to
choose in different scenarios, however, remains an open question. Here we
present a comparative study using a routine task in a real particle accelerator
as an example, showing that RLO generally outperforms BO, but is not always the
best choice. Based on the study's results, we provide a clear set of criteria
to guide the choice of algorithm for a given tuning task. These can ease the
adoption of learning-based autonomous tuning solutions to the operation of
complex real-world plants, ultimately improving the availability and pushing
the limits of operability of these facilities, thereby enabling scientific and
engineering advancements.

使用反馈强化学习优化（RLO）和贝叶斯优化（BO）进行比较研究，在实际粒子加速器任务中，发现 RLO 通常表现更优，但并非在所有情况下都是最佳选择。基于研究结果，提供了一组明确的标准，以指导选择给定调谐任务的算法。

学做还是边做边学：强化学习与贝叶斯优化的在线连续调节

Learning to Do or Learning While Doing: Reinforcement Learning and  Bayesian Optimisation for Online Continuous Tuning

In recent years, the planning community has observed that techniques for
learning heuristic functions have yielded improvements in performance. One
approach is to use offline learning to learn predictive models from existing
heuristics in a domain dependent manner. These learned models are deployed as
new heuristic functions. The learned models can in turn be tuned online using a
domain independent error correction approach to further enhance their
informativeness. The online tuning approach is domain independent but instance
specific, and contributes to improved performance for individual instances as
planning proceeds. Consequently it is more effective in larger problems.
In this paper, we mention two approaches applicable in Partial Order Causal
Link (POCL) Planning that is also known as Plan Space Planning. First, we
endeavor to enhance the performance of a POCL planner by giving an algorithm
for supervised learning. Second, we then discuss an online error minimization
approach in POCL framework to minimize the step-error associated with the
offline learned models thus enhancing their informativeness. Our evaluation
shows that the learning approaches scale up the performance of the planner over
standard benchmarks, specially for larger problems.

本文介绍了两种方法来改善 POCL 规划器的性能，包括使用有监督学习算法优化 POCL 规划器和使用在线误差最小化方法进一步提高模型的信息量。实验证明，这些学习方法可扩展规划器的性能，特别是在处理较大问题时更为有效。