Training task-oriented dialog agents based on reinforcement learning is
time-consuming and requires a large number of interactions with real users. How
to grasp dialog policy within limited dialog experiences remains an obstacle
that makes the agent training process less efficient. In addition, most
previous frameworks start training by randomly choosing training samples, which
differs from the human learning method and hurts the efficiency and stability
of training. Therefore, we propose Scheduled Curiosity-Deep Dyna-Q (SC-DDQ), a
curiosity-driven curriculum learning framework based on a state-of-the-art
model-based reinforcement learning dialog model, Deep Dyna-Q (DDQ).
Furthermore, we designed learning schedules for SC-DDQ and DDQ, respectively,
following two opposite training strategies: classic curriculum learning and its
reverse version. Our results show that by introducing scheduled learning and
curiosity, the new framework leads to a significant improvement over the DDQ
and Deep Q-learning(DQN). Surprisingly, we found that traditional curriculum
learning was not always effective. Specifically, according to the experimental
results, the easy-first and difficult-first strategies are more suitable for
SC-DDQ and DDQ. To analyze our results, we adopted the entropy of sampled
actions to depict action exploration and found that training strategies with
high entropy in the first stage and low entropy in the last stage lead to
better performance.

基于 Deep Dyna-Q (DDQ) 模型的好奇心驱动的课程学习框架，通过计划学习和好奇心的引入，在任务导向的对话代理培训过程中获得显著改进，并发现了易先与难先策略更适合 SC-DDQ 和 DDQ。