Reinforcement learning (RL) often struggles to accomplish a sparse-reward
long-horizon task in a complex environment. Goal-conditioned reinforcement
learning (GCRL) has been employed to tackle this difficult problem via a
curriculum of easy-to-reach sub-goals. In GCRL, exploring novel sub-goals is
essential for the agent to ultimately find the pathway to the desired goal. How
to explore novel sub-goals efficiently is one of the most challenging issues in
GCRL. Several goal exploration methods have been proposed to address this issue
but still struggle to find the desired goals efficiently. In this paper, we
propose a novel learning objective by optimizing the entropy of both achieved
and new goals to be explored for more efficient goal exploration in sub-goal
selection based GCRL. To optimize this objective, we first explore and exploit
the frequently occurring goal-transition patterns mined in the environments
similar to the current task to compose skills via skill learning. Then, the
pretrained skills are applied in goal exploration. Evaluation on a variety of
spare-reward long-horizon benchmark tasks suggests that incorporating our
method into several state-of-the-art GCRL baselines significantly boosts their
exploration efficiency while improving or maintaining their performance. The
source code is available at: this https URL.

本文提出了一种新的学习目标，通过优化已实现和未来需要探索的目标的熵，以更高效地探索子目标选择基于 GCRL，该方法可以显著提高现有技术的探索效率并改善或保持它们的表现。

利用预训练技能来拓展目标勘探，用于稀疏奖励长时间尺度的目标条件加强学习

Goal Exploration Augmentation via Pre-trained Skills for Sparse-Reward Long-Horizon Goal-Conditioned Reinforcement Learning

Many existing approaches for unsupervised domain adaptation (UDA) focus on
adapting under only data distribution shift and offer limited success under
additional cross-domain label distribution shift. Recent work based on
self-training using target pseudo-labels has shown promise, but on challenging
shifts pseudo-labels may be highly unreliable, and using them for self-training
may cause error accumulation and domain misalignment. We propose Selective
Entropy Optimization via Committee Consistency (SENTRY), a UDA algorithm that
judges the reliability of a target instance based on its predictive consistency
under a committee of random image transformations. Our algorithm then
selectively minimizes predictive entropy to increase confidence on highly
consistent target instances, while maximizing predictive entropy to reduce
confidence on highly inconsistent ones. In combination with pseudo-label based
approximate target class balancing, our approach leads to significant
improvements over the state-of-the-art on 27/31 domain shifts from standard UDA
benchmarks as well as benchmarks designed to stress-test adaptation under label
distribution shift.

提出了一种基于自我训练和预测一致性的无监督域自适应算法（SENTRY），它利用随机图像变换的委员会评估目标实例的可靠性，通过增加高度一致性目标实例的置信度，减少高度不一致实例的置信度来选择性地最小化预测熵和最大化预测熵。该算法结合了基于伪标签的近似目标类平衡方法，在标签分布转移方面具有优异表现。

SENTRY: 无监督域适应的选择性熵优化通过委员会一致性

SENTRY: Selective Entropy Optimization via Committee Consistency for  Unsupervised Domain Adaptation

Unsupervised domain adaptation methods traditionally assume that all source
categories are present in the target domain. In practice, little may be known
about the category overlap between the two domains. While some methods address
target settings with either partial or open-set categories, they assume that
the particular setting is known a priori. We propose a more universally
applicable domain adaptation framework that can handle arbitrary category
shift, called Domain Adaptative Neighborhood Clustering via Entropy
optimization (DANCE). DANCE combines two novel ideas: First, as we cannot fully
rely on source categories to learn features discriminative for the target, we
propose a novel neighborhood clustering technique to learn the structure of the
target domain in a self-supervised way. Second, we use entropy-based feature
alignment and rejection to align target features with the source, or reject
them as unknown categories based on their entropy. We show through extensive
experiments that DANCE outperforms baselines across open-set, open-partial and
partial domain adaptation settings. Implementation is available at
this https URL

该论文提出了一种更具普适性的领域自适应框架，称为通过熵优化的领域自适应邻域聚类（DANCE），可以处理任意类别转移。DANCE 结合两个新颖的想法：第一，我们提出了一种新的邻域聚类技术，在无监督的情况下学习目标域的结构。第二，我们利用基于熵的特征对齐和拒绝来对齐目标特征与源特征，或根据其熵拒绝它们作为未知类别。实验结果表明，DANCE 在开放集，开放部分和部分领域自适应设置中优于基线。