There has been a growing interest in developing learner models to enhance
learning and teaching experiences in educational environments. However,
existing works have primarily focused on structured environments relying on
meticulously crafted representations of tasks, thereby limiting the agent's
ability to generalize skills across tasks. In this paper, we aim to enhance the
generalization capabilities of agents in open-ended text-based learning
environments by integrating Reinforcement Learning (RL) with Large Language
Models (LLMs). We investigate three types of agents: (i) RL-based agents that
utilize natural language for state and action representations to find the best
interaction strategy, (ii) LLM-based agents that leverage the model's general
knowledge and reasoning through prompting, and (iii) hybrid LLM-assisted RL
agents that combine these two strategies to improve agents' performance and
generalization. To support the development and evaluation of these agents, we
introduce PharmaSimText, a novel benchmark derived from the PharmaSim virtual
pharmacy environment designed for practicing diagnostic conversations. Our
results show that RL-based agents excel in task completion but lack in asking
quality diagnostic questions. In contrast, LLM-based agents perform better in
asking diagnostic questions but fall short of completing the task. Finally,
hybrid LLM-assisted RL agents enable us to overcome these limitations,
highlighting the potential of combining RL and LLMs to develop high-performing
agents for open-ended learning environments.

通过将强化学习与大型语言模型结合，研究了在开放性文本学习环境中增强代理的泛化能力，提出了三种代理类型：基于强化学习的代理、基于大型语言模型的代理和融合两者的混合代理，以提高代理的性能和泛化能力，并通过 PharmaSimText 提供的基准测试验证了研究成果。结果表明，基于强化学习的代理在任务完成方面表现出色，但在提问诊断问题方面有所欠缺；相反，基于大型语言模型的代理在提问诊断问题方面表现较好，但在完成任务方面表现较差；而混合的大型语言模型辅助强化学习代理能够克服这些限制，凸显了将强化学习和大型语言模型相结合在开放性学习环境中开发高性能代理的潜力。