The prevalent use of benchmarks in current offline reinforcement learning
(RL) research has led to a neglect of the imbalance of real-world dataset
distributions in the development of models. The real-world offline RL dataset
is often imbalanced over the state space due to the challenge of exploration or
safety considerations. In this paper, we specify properties of imbalanced
datasets in offline RL, where the state coverage follows a power law
distribution characterized by skewed policies. Theoretically and empirically,
we show that typically offline RL methods based on distributional constraints,
such as conservative Q-learning (CQL), are ineffective in extracting policies
under the imbalanced dataset. Inspired by natural intelligence, we propose a
novel offline RL method that utilizes the augmentation of CQL with a retrieval
process to recall past related experiences, effectively alleviating the
challenges posed by imbalanced datasets. We evaluate our method on several
tasks in the context of imbalanced datasets with varying levels of imbalance,
utilizing the variant of D4RL. Empirical results demonstrate the superiority of
our method over other baselines.

通过在分布式的约束条件如 onservative Q-learning 基础上引入信息检索过程，有效地减轻了失衡数据集所带来的挑战，我们提出了一种新颖的离线强化学习方法，并在不同程度失衡的数据集上的几个任务中评估了其优劣。

离线不平衡数据集的强化学习

Offline Reinforcement Learning with Imbalanced Datasets

Neural conversation models tend to generate safe, generic responses for most
inputs. This is due to the limitations of likelihood-based decoding objectives
in generation tasks with diverse outputs, such as conversation. To address this
challenge, we propose a simple yet effective approach for incorporating side
information in the form of distributional constraints over the generated
responses. We propose two constraints that help generate more content rich
responses that are based on a model of syntax and topics (Griffiths et al.,
2005) and semantic similarity (Arora et al., 2016). We evaluate our approach
against a variety of competitive baselines, using both automatic metrics and
human judgments, showing that our proposed approach generates responses that
are much less generic without sacrificing plausibility. A working demo of our
code can be found at this https URL

提出了一种简单而有效的方法，在生成会话等输出多样化任务时引入分布式约束的辅助信息，通过利用语法、主题模型和语义相似性来生成更具内容丰富性的响应，实验证明这种方法可以生成精细的响应，且不牺牲可信度。

使用分布式约束在神经对话模型中生成更有趣的回复

Generating More Interesting Responses in Neural Conversation Models with  Distributional Constraints

Committee selection with diversity or distributional constraints is a
ubiquitous problem. However, many of the formal approaches proposed so far have
certain drawbacks including (1) computationally intractability in general, and
(2) inability to suggest a solution for certain instances where the hard
constraints cannot be met. We propose a practical and polynomial-time algorithm
for diverse committee selection that draws on the idea of using soft bounds and
satisfies natural axioms.

提出了一个基于软界限的多样化委员会选择的实用多项式时间算法。