Contextual bandits have emerged as a cornerstone in reinforcement learning,
enabling systems to make decisions with partial feedback. However, as contexts
grow in complexity, traditional bandit algorithms can face challenges in
adequately capturing and utilizing such contexts. In this paper, we propose a
novel integration of large language models (LLMs) with the contextual bandit
framework. By leveraging LLMs as an encoder, we enrich the representation of
the context, providing the bandit with a denser and more informative view.
Preliminary results on synthetic datasets demonstrate the potential of this
approach, showing notable improvements in cumulative rewards and reductions in
regret compared to traditional bandit algorithms. This integration not only
showcases the capabilities of LLMs in reinforcement learning but also opens the
door to a new era of contextually-aware decision systems.

通过将大型语言模型与情境赌博算法框架相融合，加强了对于情境的表示，提供更密集且更丰富的视角，初步结果表明这种方法的潜力，与传统赌博算法相比，在累积奖励上有显著改善，且减少了后悔。这种整合不仅展示了大型语言模型在强化学习中的能力，还为全新的情境感知决策系统开启了新的篇章。