Rewards play an essential role in reinforcement learning. In contrast to
rule-based game environments with well-defined reward functions, complex
real-world robotic applications, such as contact-rich manipulation, lack
explicit and informative descriptions that can directly be used as a reward.
Previous effort has shown that it is possible to algorithmically extract dense
rewards directly from multimodal observations. In this paper, we aim to extend
this effort by proposing a more efficient and robust way of sampling and
learning. In particular, our sampling approach utilizes temporal variance to
simulate the fluctuating state and action distribution of a manipulation task.
We then proposed a network architecture for self-supervised learning to better
incorporate temporal information in latent representations. We tested our
approach in two experimental setups, namely joint-assembly and door-opening.
Preliminary results show that our approach is effective and efficient in
learning dense rewards, and the learned rewards lead to faster convergence than
baselines.

本文提出了一种从多模态观察中抽取稠密奖励的更高效和更强韧的方法，在联合装配和开门两个实验设置中测试表明，这种方法在学习稠密奖励方面是有效和高效的，并且学习到的奖励导致更快的收敛。

使用时间变化的自我监督学习密集奖励

Learning Dense Reward with Temporal Variant Self-Supervision

In this paper, we explore whether a robot can learn to hang arbitrary objects
onto a diverse set of supporting items such as racks or hooks. Endowing robots
with such an ability has applications in many domains such as domestic
services, logistics, or manufacturing. Yet, it is a challenging manipulation
task due to the large diversity of geometry and topology of everyday objects.
In this paper, we propose a system that takes partial point clouds of an object
and a supporting item as input and learns to decide where and how to hang the
object stably. Our system learns to estimate the contact point correspondences
between the object and supporting item to get an estimated stable pose. We then
run a deep reinforcement learning algorithm to refine the predicted stable
pose. Then, the robot needs to find a collision-free path to move the object
from its initial pose to stable hanging pose. To this end, we train a neural
network based collision estimator that takes as input partial point clouds of
the object and supporting item. We generate a new and challenging, large-scale,
synthetic dataset annotated with stable poses of objects hung on various
supporting items and their contact point correspondences. In this dataset, we
show that our system is able to achieve a 68.3% success rate of predicting
stable object poses and has a 52.1% F1 score in terms of finding feasible
paths. Supplemental material and videos are available on our project webpage.

本文探讨机器人是否能够学习把任意物品悬挂在各种支撑物上，并 提 出了一个系统，该系统采用部分点云图像作为输入并学习如何稳定地悬挂物品，使用深度强化学习算法来进行预测和细化，同时训练神经网络用于检测碰撞，并提供了相应的数据集。

OmniHang: 使用接触点对应和神经网络碰撞估计学习悬挂任意物体

OmniHang: Learning to Hang Arbitrary Objects using Contact Point  Correspondences and Neural Collision Estimation

Real-world robotics problems often occur in domains that differ significantly
from the robot's prior training environment. For many robotic control tasks,
real world experience is expensive to obtain, but data is easy to collect in
either an instrumented environment or in simulation. We propose a novel domain
adaptation approach for robot perception that adapts visual representations
learned on a large easy-to-obtain source dataset (e.g. synthetic images) to a
target real-world domain, without requiring expensive manual data annotation of
real world data before policy search. Supervised domain adaptation methods
minimize cross-domain differences using pairs of aligned images that contain
the same object or scene in both the source and target domains, thus learning a
domain-invariant representation. However, they require manual alignment of such
image pairs. Fully unsupervised adaptation methods rely on minimizing the
discrepancy between the feature distributions across domains. We propose a
novel, more powerful combination of both distribution and pairwise image
alignment, and remove the requirement for expensive annotation by using weakly
aligned pairs of images in the source and target domains. Focusing on adapting
from simulation to real world data using a PR2 robot, we evaluate our approach
on a manipulation task and show that by using weakly paired images, our method
compensates for domain shift more effectively than previous techniques,
enabling better robot performance in the real world.

提出了一种新颖的领域适应方法，将在大型易于获得的源数据集 (例如，合成图像) 上学习的视觉表示适应到目标实际世界领域，不需要昂贵的手工数据注释。作者使用弱对齐图像，结合分布对齐的方式来解决实际和模拟环境差异的问题，并在机器人操作任务上对其进行了评估。