Successful applications of distributional reinforcement learning with quantile regression prompt a natural question: can we use other statistics to represent the distribution of returns? In particular, expectile regression is known to be more efficient than quantile regression for approximating distributions, especially on extreme values, and by providing a straightforward estimator of the mean it is a natural candidate for reinforcement learning. Prior work has answered this question positively in the case of expectiles, with the major caveat that expensive computations must be performed to ensure convergence. In this work, we propose a dual expectile-quantile approach which solves the shortcomings of previous work while leveraging the complementary properties of expectiles and quantiles. Our method outperforms both quantile-based and expectile-based baselines on the MuJoCo continuous control benchmark.

本篇论文提出了一种将expectiles和quantiles相结合的方法，用于表示回报的分布，该方法充分利用了它们在估计分布方面的独特性质，相较于先前的基于quantile或expectile的算法在MuJoCo continuous control benchmark上表现更好。

使用双Expectile-Quantile 回归的分布强化学习