We address the challenge of effective exploration while maintaining good
performance in policy gradient methods. As a solution, we propose diverse
exploration (DE) via conjugate policies. DE learns and deploys a set of
conjugate policies which can be conveniently generated as a byproduct of
conjugate gradient descent. We provide both theoretical and empirical results
showing the effectiveness of DE at achieving exploration, improving policy
performance, and the advantage of DE over exploration by random policy
perturbations.

本文提出通过共轭策略的多样化探索（DE），以解决在政策梯度方法中保持良好性能的有效探索的问题，DE 学习和应用一组共轭策略，并提供了理论和实证结果，证明 DE 实现了探索，提高了策略性能，并优于探索随机策略扰动。

基于共轭策略的策略梯度方法的多样化探索

Diverse Exploration via Conjugate Policies for Policy Gradient Methods

We study the Riemannian optimization methods on the embedded manifold of low
rank matrices for the problem of matrix completion, which is about recovering a
low rank matrix from its partial entries. Assume $m$ entries of an $n\times n$
rank $r$ matrix are sampled independently and uniformly with replacement. We
first prove that with high probability the Riemannian gradient descent and
conjugate gradient descent algorithms initialized by one step hard thresholding
are guaranteed to converge linearly to the measured matrix provided
\begin{align*} m\geq C_\kappa n^{1.5}r\log^{1.5}(n), \end{align*} where
$C_\kappa$ is a numerical constant depending on the condition number of the
underlying matrix. The sampling complexity has been further improved to
\begin{align*} m\geq C_\kappa nr^2\log^{2}(n) \end{align*} via the resampled
Riemannian gradient descent initialization. The analysis of the new
initialization procedure relies on an asymmetric restricted isometry property
of the sampling operator and the curvature of the low rank matrix manifold.
Numerical simulation shows that the algorithms are able to recover a low rank
matrix from nearly the minimum number of measurements.

研究嵌入低秩矩阵流形的黎曼优化方法在矩阵补全问题上的应用和收敛性，其中采样复杂度能进一步通过重新采样的黎曼梯度下降初始化方法减小，这取决于采样算子的像的非对称限制性同构性质和低秩矩阵流形的曲率。