We study joint learning of network topology and a mixed opinion dynamics, in
which agents may have different update rules. Such a model captures the
diversity of real individual interactions. We propose a learning algorithm
based on multi-armed bandit algorithms to address the problem. The goal of the
algorithm is to find each agent's update rule from several candidate rules and
to learn the underlying network. At each iteration, the algorithm assumes that
each agent has one of the updated rules and then modifies network estimates to
reduce validation error. Numerical experiments show that the proposed algorithm
improves initial estimates of the network and update rules, decreases
prediction error, and performs better than other methods such as sparse linear
regression and Gaussian process regression.

提出一种基于多臂赌博算法的学习算法来解决连通性结构和混合意见动态之间的联合学习问题，目的为了找到每个 agent 的更新规则并学习底层的网络同时通过减少网络误差改善预测表现，此算法在数值实验上表现出比稀疏线性回归和高斯过程回归等方法更好的效果。

基于赌博算法的网络拓扑与舆论动态联合学习

Joint Learning of Network Topology and Opinion Dynamics Based on Bandit  Algorithms

The softmax policy gradient (PG) method, which performs gradient ascent under
softmax policy parameterization, is arguably one of the de facto
implementations of policy optimization in modern reinforcement learning. For
$\gamma$-discounted infinite-horizon tabular Markov decision processes (MDPs),
remarkable progress has recently been achieved towards establishing global
convergence of softmax PG methods in finding a near-optimal policy. However,
prior results fall short of delineating clear dependencies of convergence rates
on salient parameters such as the cardinality of the state space $\mathcal{S}$
and the effective horizon $\frac{1}{1-\gamma}$, both of which could be
excessively large. In this paper, we deliver a pessimistic message regarding
the iteration complexity of softmax PG methods, despite assuming access to
exact gradient computation. Specifically, we demonstrate that the softmax PG
method with stepsize $\eta$ can take \[
\frac{1}{\eta} |\mathcal{S}|^{2^{\Omega\big(\frac{1}{1-\gamma}\big)}}
~\text{iterations} \] to converge, even in the presence of a benign policy
initialization and an initial state distribution amenable to exploration (so
that the distribution mismatch coefficient is not exceedingly large). This is
accomplished by characterizing the algorithmic dynamics over a
carefully-constructed MDP containing only three actions. Our exponential lower
bound hints at the necessity of carefully adjusting update rules or enforcing
proper regularization in accelerating PG methods.

该研究针对 softmax policy gradient 方法在无限时间马尔可夫决策过程中全局收敛的复杂度问题进行了探究，给出了反例并提示了在加速 PG 方法中调整更新规则或强制执行适当规则化的必要性。

Softmax Policy Gradient 方法可能需要指数时间才能收敛

Softmax Policy Gradient Methods Can Take Exponential Time to Converge

We study the problem of recovering the phase from magnitude measurements;
specifically, we wish to reconstruct a complex-valued signal x of C^n about
which we have phaseless samples of the form y_r = |< a_r,x >|^2, r = 1,2,...,m
(knowledge of the phase of these samples would yield a linear system). This
paper develops a non-convex formulation of the phase retrieval problem as well
as a concrete solution algorithm. In a nutshell, this algorithm starts with a
careful initialization obtained by means of a spectral method, and then refines
this initial estimate by iteratively applying novel update rules, which have
low computational complexity, much like in a gradient descent scheme. The main
contribution is that this algorithm is shown to rigorously allow the exact
retrieval of phase information from a nearly minimal number of random
measurements. Indeed, the sequence of successive iterates provably converges to
the solution at a geometric rate so that the proposed scheme is efficient both
in terms of computational and data resources. In theory, a variation on this
scheme leads to a near-linear time algorithm for a physically realizable model
based on coded diffraction patterns. We illustrate the effectiveness of our
methods with various experiments on image data. Underlying our analysis are
insights for the analysis of non-convex optimization schemes that may have
implications for computational problems beyond phase retrieval.

本文提出一种非凸公式的相位恢复方法，通过随机数迭代更新的规则精确地重建了信号的相位信息。此算法具有低计算复杂性并在计算和数据资源方面都非常有效。