Replicating a user's pose from only wearable sensors is important for many
AR/VR applications. Most existing methods for motion tracking avoid environment
interaction apart from foot-floor contact due to their complex dynamics and
hard constraints. However, in daily life people regularly interact with their
environment, e.g. by sitting on a couch or leaning on a desk. Using
Reinforcement Learning, we show that headset and controller pose, if combined
with physics simulation and environment observations can generate realistic
full-body poses even in highly constrained environments. The physics simulation
automatically enforces the various constraints necessary for realistic poses,
instead of manually specifying them as in many kinematic approaches. These hard
constraints allow us to achieve high-quality interaction motions without
typical artifacts such as penetration or contact sliding. We discuss three
features, the environment representation, the contact reward and scene
randomization, crucial to the performance of the method. We demonstrate the
generality of the approach through various examples, such as sitting on chairs,
a couch and boxes, stepping over boxes, rocking a chair and turning an office
chair. We believe these are some of the highest-quality results achieved for
motion tracking from sparse sensor with scene interaction.

使用强化学习相结合的物理模拟和环境观察，可以在高度约束的环境中生成逼真的全身姿势，从而避免接触的干扰。

QuestEnvSim：基于环境感知的稀疏传感器模拟运动跟踪

QuestEnvSim: Environment-Aware Simulated Motion Tracking from Sparse  Sensors

Reinforcement learning requires interaction with an environment, which is
expensive for robots. This constraint necessitates approaches that work with
limited environmental interaction by maximizing the reuse of previous
experiences. We propose an approach that maximizes experience reuse while
learning to solve a given task by generating and simultaneously learning useful
auxiliary tasks. To generate these tasks, we construct an abstract temporal
logic representation of the given task and leverage large language models to
generate context-aware object embeddings that facilitate object replacements.
Counterfactual reasoning and off-policy methods allow us to simultaneously
learn these auxiliary tasks while solving the given target task. We combine
these insights into a novel framework for multitask reinforcement learning and
experimentally show that our generated auxiliary tasks share similar underlying
exploration requirements as the given task, thereby maximizing the utility of
directed exploration. Our approach allows agents to automatically learn
additional useful policies without extra environment interaction.

通过生成和学习有用的辅助任务，最大化经验重用，从而学习解决给定任务的方法，通过计数推理和离线策略方法同时学习这些辅助任务，从而实现多任务强化学习的新框架。

利用上下文结构生成有用的辅助任务

Exploiting Contextual Structure to Generate Useful Auxiliary Tasks

The interaction between an artificial agent and its environment is
bi-directional. The agent extracts relevant information from the environment,
and affects the environment by its actions in return to accumulate high
expected reward. Standard reinforcement learning (RL) deals with the expected
reward maximization. However, there are always information-theoretic
limitations that restrict the expected reward, which are not properly
considered by the standard RL. In this work we consider RL objectives with
information-theoretic limitations. For the first time we derive a Bellman-type
recursive equa- tion for the causal information between the environment and the
agent, which is combined plausibly with the Bellman recursion for the value
function. The unified equitation serves to explore the typical behavior of
artificial agents in an infinite time horizon.

研究人工智能代理和其环境的交互，探讨了在信息理论限制下如何通过强化学习算法使代理能够在无限时间范围内获得最大化的预期回报。首次提出了环境和代理之间因果信息的贝尔曼递归方程，与值函数的贝尔曼递归方程结合使用。