The offline reinforcement learning (RL) paradigm provides a general recipe to
convert static behavior datasets into policies that can perform better than the
policy that collected the data. While policy constraints, conservatism, and
other methods for mitigating distributional shifts have made offline
reinforcement learning more effective, the continuous action setting often
necessitates various approximations for applying these techniques. Many of
these challenges are greatly alleviated in discrete action settings, where
offline RL constraints and regularizers can often be computed more precisely or
even exactly. In this paper, we propose an adaptive scheme for action
quantization. We use a VQ-VAE to learn state-conditioned action quantization,
avoiding the exponential blowup that comes with na\"ive discretization of the
action space. We show that several state-of-the-art offline RL methods such as
IQL, CQL, and BRAC improve in performance on benchmarks when combined with our
proposed discretization scheme. We further validate our approach on a set of
challenging long-horizon complex robotic manipulation tasks in the Robomimic
environment, where our discretized offline RL algorithms are able to improve
upon their continuous counterparts by 2-3x. Our project page is at
this https URL

我们提出了一种自适应的行动量化方案，通过使用 VQ-VAE 学习状态条件的行动量化，避免了行动空间的指数爆炸问题，并通过离线强化学习方法在基准测试中改进了性能，同时在 Robomimic 环境中的复杂机器人操作任务中，离线强化学习算法通过离散化相对于连续方法实现了 2-3 倍的改进。

机器人技能学习的动作量化离线强化学习

Action-Quantized Offline Reinforcement Learning for Robotic Skill  Learning

In this paper, we propose a novel Reinforcement Learning (RL) framework for
problems with continuous action spaces: Action Quantization from Demonstrations
(AQuaDem). The proposed approach consists in learning a discretization of
continuous action spaces from human demonstrations. This discretization returns
a set of plausible actions (in light of the demonstrations) for each input
state, thus capturing the priors of the demonstrator and their multimodal
behavior. By discretizing the action space, any discrete action deep RL
technique can be readily applied to the continuous control problem. Experiments
show that the proposed approach outperforms state-of-the-art methods such as
SAC in the RL setup, and GAIL in the Imitation Learning setup. We provide a
website with interactive videos: this https URL and
make the code available:
this https URL

本文提出了一种基于 RL 的新方法 AQuaDem，可从人类演示中学习连续动作空间的离散化，以实现在连续控制问题上的离散动作深度 RL 技术应用，并通过实验证明了优于 SAC 和 GAIL 的性能。