The advice model of online computation captures the setting in which the
online algorithm is given some information concerning the request sequence.
This paradigm allows to establish tradeoffs between the amount of this
additional information and the performance of the online algorithm. However,
unlike real life in which advice is a recommendation that we can choose to
follow or to ignore based on trustworthiness, in the current advice model, the
online algorithm treats it as infallible. This means that if the advice is
corrupt or, worse, if it comes from a malicious source, the algorithm may
perform poorly. In this work, we study online computation in a setting in which
the advice is provided by an untrusted source. Our objective is to quantify the
impact of untrusted advice so as to design and analyze online algorithms that
are robust and perform well even when the advice is generated in a malicious,
adversarial manner. To this end, we focus on well- studied online problems such
as ski rental, online bidding, bin packing, and list update. For ski-rental and
online bidding, we show how to obtain algorithms that are Pareto-optimal with
respect to the competitive ratios achieved; this improves upon the framework of
Purohit et al. [NeurIPS 2018] in which Pareto-optimality is not necessarily
guaranteed. For bin packing and list update, we give online algorithms with
worst-case tradeoffs in their competitiveness, depending on whether the advice
is trusted or not; this is motivated by work of Lykouris and Vassilvitskii
[ICML 2018] on the paging problem, but in which the competitiveness depends on
the reliability of the advice. More importantly, we demonstrate how to prove
lower bounds, within this model, on the tradeoff between the number of advice
bits and the competitiveness of any online algorithm.

该研究考虑了在线计算的建议模型下，建议来源于不受信任的源头所带来的影响，并以滑雪租赁、装箱问题等在线问题为研究对象，得出了根据建议是否可信获得最差情况下的竞争性质算法的结论，解决了现有模型无法解决的差异化问题。

在线计算与不可信的建议

Online Computation with Untrusted Advice

Episodic memory is a psychology term which refers to the ability to recall
specific events from the past. We suggest one advantage of this particular type
of memory is the ability to easily assign credit to a specific state when
remembered information is found to be useful. Inspired by this idea, and the
increasing popularity of external memory mechanisms to handle long-term
dependencies in deep learning systems, we propose a novel algorithm which uses
a reservoir sampling procedure to maintain an external memory consisting of a
fixed number of past states. The algorithm allows a deep reinforcement learning
agent to learn online to preferentially remember those states which are found
to be useful to recall later on. Critically this method allows for efficient
online computation of gradient estimates with respect to the write process of
the external memory. Thus unlike most prior mechanisms for external memory it
is feasible to use in an online reinforcement learning setting.

该文研究了利用固定数量的过去状态来维护一个外部内存的新算法，使得深度强化学习代理能够在线记忆有用的状态，并可以在在线强化学习设置中实现梯度估计。