With the help of Chain-of-Thought (CoT) prompting, Large Language Models
(LLMs) have achieved remarkable performance on various reasoning tasks.
However, most of them have been evaluated under noise-free context and the
dilemma for LLMs to produce inaccurate results under the noisy context has not
been fully investigated. Existing studies utilize trigger sentences to
encourage LLMs to concentrate on the relevant information but the trigger has
limited effect on final answer prediction. Inspired by interactive CoT method,
where intermediate reasoning steps are promoted by multiple rounds of
interaction between users and LLMs, we propose a novel prompting method, namely
R$^3$ prompting, for CoT reasoning under noisy context. Specifically, R$^3$
prompting interacts with LLMs to perform key sentence extraction, variable
declaration and answer prediction, which corresponds to a thought process of
reviewing, rephrasing and resolving. The responses generated at the last
interaction will perform as hints to guide toward the responses of the next
interaction. Our experiments show that R$^3$ prompting significantly
outperforms existing CoT prompting methods on five reasoning tasks under noisy
context. With GPT-3.5-turbo, we observe 3.7% accuracy improvement on average on
the reasoning tasks under noisy context compared to the most competitive
prompting baseline. More analyses and ablation studies show the robustness and
generalization of R$^3$ prompting method in solving reasoning tasks in LLMs
under noisy context.

通过使用 R3 提示方法来处理嘈杂语境下的 CoT 推理，能够提高 LLM 在推理任务中的准确性。与现有的 CoT 提示方法相比，R3 提示方法在噪声环境下显著优越，通过与 GPT-3.5-turbo 的实验观察，平均推理准确性提高了 3.7％。该方法在解决噪声环境下 LLM 的推理任务时表现出了强大的鲁棒性和普适性。

R$^3$ 提示：在嘈杂的语境下为大型语言模型的思路链进行检查、改述和解决

R$^3$ Prompting: Review, Rephrase and Resolve for Chain-of-Thought  Reasoning in Large Language Models under Noisy Context

We consider the contextual bandit problem where at each time, the agent only
has access to a noisy version of the context and the error variance (or an
estimator of this variance). This setting is motivated by a wide range of
applications where the true context for decision-making is unobserved, and only
a prediction of the context by a potentially complex machine learning algorithm
is available. When the context error is non-diminishing, classical bandit
algorithms fail to achieve sublinear regret. We propose the first online
algorithm in this setting with sublinear regret compared to the appropriate
benchmark. The key idea is to extend the measurement error model in classical
statistics to the online decision-making setting, which is nontrivial due to
the policy being dependent on the noisy context observations.

我们考虑了上下文强盗问题，在每个时间点上，代理只能访问上下文的嘈杂版本和误差方差（或该方差的估计）。我们提出了第一个在线算法，与适当的基准相比，在此设置中具有亚线性遗憾，其关键思想是将经典统计中的测量误差模型延伸到在线决策情境中，这是一个非常复杂的问题，因为策略依赖于嘈杂的上下文观察。