The combination of deep reinforcement learning (DRL) with ensemble methods
has been proved to be highly effective in addressing complex sequential
decision-making problems. This success can be primarily attributed to the
utilization of multiple models, which enhances both the robustness of the
policy and the accuracy of value function estimation. However, there has been
limited analysis of the empirical success of current ensemble RL methods thus
far. Our new analysis reveals that the sample efficiency of previous ensemble
DRL algorithms may be limited by sub-policies that are not as diverse as they
could be. Motivated by these findings, our study introduces a new ensemble RL
algorithm, termed \textbf{T}rajectories-awar\textbf{E} \textbf{E}nsemble
exploratio\textbf{N} (TEEN). The primary goal of TEEN is to maximize the
expected return while promoting more diverse trajectories. Through extensive
experiments, we demonstrate that TEEN not only enhances the sample diversity of
the ensemble policy compared to using sub-policies alone but also improves the
performance over ensemble RL algorithms. On average, TEEN outperforms the
baseline ensemble DRL algorithms by 41\% in performance on the tested
representative environments.

通过使用深度强化学习和集成方法，我们提出了一种新的集成强化学习算法 TEEN，在实验证明 TEEN 相对于仅使用子策略能够增加集成策略的样本多样性，并且在性能上表现更好，平均而言 TEEN 在经过测试的代表性环境中比基线集成强化学习算法的性能提高了 41%。