In-context learning is a promising approach for offline reinforcement
learning (RL) to handle online tasks, which can be achieved by providing task
prompts. Recent works demonstrated that in-context RL could emerge with
self-improvement in a trial-and-error manner when treating RL tasks as an
across-episodic sequential prediction problem. Despite the self-improvement not
requiring gradient updates, current works still suffer from high computational
costs when the across-episodic sequence increases with task horizons. To this
end, we propose an In-context Decision Transformer (IDT) to achieve
self-improvement in a high-level trial-and-error manner. Specifically, IDT is
inspired by the efficient hierarchical structure of human decision-making and
thus reconstructs the sequence to consist of high-level decisions instead of
low-level actions that interact with environments. As one high-level decision
can guide multi-step low-level actions, IDT naturally avoids excessively long
sequences and solves online tasks more efficiently. Experimental results show
that IDT achieves state-of-the-art in long-horizon tasks over current
in-context RL methods. In particular, the online evaluation time of our IDT is
\textbf{36$\times$} times faster than baselines in the D4RL benchmark and
\textbf{27$\times$} times faster in the Grid World benchmark.

提出了一种高层次的基于试错的方法，通过在环境中提供任务提示来实现离线强化学习的上下文学习，可以更高效地解决在线任务，并在长期任务中取得了最先进的结果。