The two-time scale nature of SAC, which is an actor-critic algorithm, is
characterised by the fact that the critic estimate has not converged for the
actor at any given time, but since the critic learns faster than the actor, it
ensures eventual consistency between the two. Various strategies have been
introduced in literature to learn better gradient estimates to help achieve
better convergence. Since gradient estimates depend upon the critic, we posit
that improving the critic can provide a better gradient estimate for the actor
at each time. Utilizing this, we propose Soft Actor Retrospective Critic
(SARC), where we augment the SAC critic loss with another loss term -
retrospective loss - leading to faster critic convergence and consequently,
better policy gradient estimates for the actor. An existing implementation of
SAC can be easily adapted to SARC with minimal modifications. Through extensive
experimentation and analysis, we show that SARC provides consistent improvement
over SAC on benchmark environments. We plan to open-source the code and all
experiment data at: this https URL

本文提出了软演员回溯评论家（SARC）算法，通过增加回溯损失项来改进 SAC 的评论家学习，从而提高政策梯度估计和实现更好的策略，在基准环境中展示了 SARC 对 SAC 的持续改进表现。

SARC：软性演员回顾评论家

SARC: Soft Actor Retrospective Critic

Open Information Extraction (OIE) aims to extract relational tuples from
open-domain sentences. Existing OIE systems split a sentence into tokens and
recognize token spans as tuple relations and arguments. We instead propose
Sentence as Chunk sequence (SaC) and recognize chunk spans as tuple relations
and arguments. We argue that SaC has better quantitative and qualitative
properties for OIE than sentence as token sequence, and evaluate four choices
of chunks (i.e., CoNLL chunks, simple phrases, NP chunks, and spans from
SpanOIE) against gold OIE tuples. Accordingly, we propose a simple BERT-based
model for sentence chunking, and propose Chunk-OIE for tuple extraction on top
of SaC. Chunk-OIE achieves state-of-the-art results on multiple OIE datasets,
showing that SaC benefits OIE task.

本研究提出了一种名为 SaC（Sentence as Chunk sequence）的新方法用于 Open Information Extraction（OIE）任务中的元组抽取，并使用基于 BERT 的简单模型 Chunk-OIE，在多个 OIE 数据集上实现了最前沿的结果，表明 SaC 对 OIE 任务有益。

基于块的开放信息性提取

Open Information Extraction via Chunks

Deep Reinforcement Learning (DRL) has made tremendous advances in both
simulated and real-world robot control tasks in recent years. Nevertheless,
applying DRL to novel robot control tasks is still challenging, especially when
researchers have to design the action and observation space and the reward
function. In this paper, we investigate partial observability as a potential
failure source of applying DRL to robot control tasks, which can occur when
researchers are not confident whether the observation space fully represents
the underlying state. We compare the performance of three common DRL
algorithms, TD3, SAC and PPO under various partial observability conditions. We
find that TD3 and SAC become easily stuck in local optima and underperform PPO.
We propose multi-step versions of the vanilla TD3 and SAC to improve robustness
to partial observability based on one-step bootstrapping.

本文研究了 Deep Reinforcement Learning 在机器人控制任务中的应用，特别是在部分可观性条件下，比较了 TD3、SAC 和 PPO 算法的表现，并提出了改进部分可观性下 TD3 和 SAC 算法鲁棒性的多步版本算法。

机器人控制的 DRL 过程中的部分可观测性

Partial Observability during DRL for Robot Control

Graph embedding methods including traditional shallow models and deep Graph
Neural Networks (GNNs) have led to promising applications in recommendation.
Nevertheless, shallow models especially random-walk-based algorithms fail to
adequately exploit neighbor proximity in sampled subgraphs or sequences due to
their optimization paradigm. GNN-based algorithms suffer from the insufficient
utilization of high-order information and easily cause over-smoothing problems
when stacking too much layers, which may deteriorate the recommendations of
low-degree (long-tail) items, limiting the expressiveness and scalability. In
this paper, we propose a novel framework SAC, namely Spatial Autoregressive
Coding, to solve the above problems in a unified way. To adequately leverage
neighbor proximity and high-order information, we design a novel spatial
autoregressive paradigm. Specifically, we first randomly mask multi-hop
neighbors and embed the target node by integrating all other surrounding
neighbors with an explicit multi-hop attention. Then we reinforce the model to
learn a neighbor-predictive coding for the target node by contrasting the
coding and the masked neighbors' embedding, equipped with a new hard negative
sampling strategy. To learn the minimal sufficient representation for the
target-to-neighbor prediction task and remove the redundancy of neighbors, we
devise Neighbor Information Bottleneck by maximizing the mutual information
between target predictive coding and the masked neighbors' embedding, and
simultaneously constraining those between the coding and surrounding neighbors'
embedding. Experimental results on both public recommendation datasets and a
real scenario web-scale dataset Douyin-Friend-Recommendation demonstrate the
superiority of SAC compared with state-of-the-art methods.

本文提出了一种名为 SAC 的新型框架，它采用了一种新的空间自回归范式来充分利用邻居的接近性和高阶信息，同时还提出了邻居信息瓶颈来学习目标节点到邻居的预测任务的最小充分表示，并消除邻居的冗余，实验结果表明，在公共推荐数据集和某真实情境下的大型数据集 Douyin-Friend-Recommendation 上，SAC 方法优于现有的最先进的方法。