In deep Reinforcement Learning (RL), value functions are typically approximated using deep neural networks and trained via mean squared error regression objectives to fit the true value functions. Recent research has proposed an alternative approach, utilizing the cross-entropy classification objective, which has demonstrated improved performance and scalability of RL algorithms. However, existing study have not extensively benchmarked the effects of this replacement across various domains, as the primary objective was to demonstrate the efficacy of the concept across a broad spectrum of tasks, without delving into in-depth analysis. Our work seeks to empirically investigate the impact of such a replacement in an offline RL setup and analyze the effects of different aspects on performance. Through large-scale experiments conducted across a diverse range of tasks using different algorithms, we aim to gain deeper insights into the implications of this approach. Our results reveal that incorporating this change can lead to superior performance over state-of-the-art solutions for some algorithms in certain tasks, while maintaining comparable performance levels in other tasks, however for other algorithms this modification might lead to the dramatic performance drop. This findings are crucial for further application of classification approach in research and practical tasks.

通过大规模实验和不同算法的多样性任务，我们的研究旨在实证地调查这种替代方法对性能的影响，结果显示在某些任务中，这种改变可以实现超过现有解决方案的卓越性能，而在其他任务中保持相当的性能水平；然而对于其他算法，此修改可能导致性能的显著下降。这些发现对进一步应用于研究和实际任务的分类方法至关重要。

离线强化学习中的值函数估计是否能与分类器插件一起联动?