Spinal fusion surgery requires highly accurate implantation of pedicle screw
implants, which must be conducted in critical proximity to vital structures
with a limited view of anatomy. Robotic surgery systems have been proposed to
improve placement accuracy, however, state-of-the-art systems suffer from the
limitations of open-loop approaches, as they follow traditional concepts of
preoperative planning and intraoperative registration, without real-time
recalculation of the surgical plan. In this paper, we propose an intraoperative
planning approach for robotic spine surgery that leverages real-time
observation for drill path planning based on Safe Deep Reinforcement Learning
(DRL). The main contributions of our method are (1) the capability to guarantee
safe actions by introducing an uncertainty-aware distance-based safety filter;
and (2) the ability to compensate for incomplete intraoperative anatomical
information, by encoding a-priori knowledge about anatomical structures with a
network pre-trained on high-fidelity anatomical models. Planning quality was
assessed by quantitative comparison with the gold standard (GS) drill planning.
In experiments with 5 models derived from real magnetic resonance imaging (MRI)
data, our approach was capable of achieving 90% bone penetration with respect
to the GS while satisfying safety requirements, even under observation and
motion uncertainty. To the best of our knowledge, our approach is the first
safe DRL approach focusing on orthopedic surgeries.

本文提出了一种基于实时观察的机器人脊柱手术规划方法，利用 Safe Deep Reinforcement Learning（DRL）计算钻孔路径，同时通过引入一个基于距离的不确定性感知安全过滤器保证安全，为手术提供尽可能高的成功率。实验表明，该方法是第一种在正畸手术上应用安全 DRL 方法，且能同时满足成功率和安全保障。

用于术中螺钉椎弓根定位规划的安全深度强化学习

Safe Deep RL for Intraoperative Planning of Pedicle Screw Placement

Cost functions are commonly employed in Safe Deep Reinforcement Learning
(DRL). However, the cost is typically encoded as an indicator function due to
the difficulty of quantifying the risk of policy decisions in the state space.
Such an encoding requires the agent to visit numerous unsafe states to learn a
cost-value function to drive the learning process toward safety. Hence,
increasing the number of unsafe interactions and decreasing sample efficiency.
In this paper, we investigate an alternative approach that uses domain
knowledge to quantify the risk in the proximity of such states by defining a
violation metric. This metric is computed by verifying task-level properties,
shaped as input-output conditions, and it is used as a penalty to bias the
policy away from unsafe states without learning an additional value function.
We investigate the benefits of using the violation metric in standard Safe DRL
benchmarks and robotic mapless navigation tasks. The navigation experiments
bridge the gap between Safe DRL and robotics, introducing a framework that
allows rapid testing on real robots. Our experiments show that policies trained
with the violation penalty achieve higher performance over Safe DRL baselines
and significantly reduce the number of visited unsafe states.

本文介绍了一种使用 “违规指标” 来惩罚无法确保安全的状态，从而更好地实现安全深度强化学习的方法，并在机器人地图导航任务中进行了实验研究，结果表明相较于进行 Safe DRL 的基线策略，使用违规指标的策略在性能上有了更好的表现，且能够大幅减少访问不安全状态的数量。