The alignment of autonomous agents with human values is a pivotal challenge when deploying these agents within physical environments, where safety is an important concern. However, defining the agent's objective as a reward and/or cost function is inherently complex and prone to human errors. In response to this challenge, we present a novel approach that leverages one-class decision trees to facilitate learning from expert demonstrations. These decision trees provide a foundation for representing a set of constraints pertinent to the given environment as a logical formula in disjunctive normal form. The learned constraints are subsequently employed within an oracle constrained reinforcement learning framework, enabling the acquisition of a safe policy. In contrast to other methods, our approach offers an interpretable representation of the constraints, a vital feature in safety-critical environments. To validate the effectiveness of our proposed method, we conduct experiments in synthetic benchmark domains and a realistic driving environment.

自主代理与人类价值的对齐是部署这些代理到物理环境中时的一个关键挑战，安全性是其中一个重要关注点。为了解决这一挑战，我们提出了一种新的方法，利用一类决策树来从专家示范中进行学习。这些决策树以逻辑公式的形式表示与给定环境相关的一组约束条件。所学到的约束条件随后用于有约束的强化学习框架，实现安全策略的获取。与其他方法不同，我们的方法提供了约束条件的可解释性表示，这在安全关键环境中是一个至关重要的特性。为了验证我们提出方法的有效性，我们在合成基准领域和真实驾驶环境中进行了实验。

使用单类决策树从示范中学习安全约束