We initiate the study of utilizing Quantum Langevin Dynamics (QLD) to solve
optimization problems, particularly those non-convex objective functions that
present substantial obstacles for traditional gradient descent algorithms.
Specifically, we examine the dynamics of a system coupled with an infinite heat
bath. This interaction induces both random quantum noise and a deterministic
damping effect to the system, which nudge the system towards a steady state
that hovers near the global minimum of objective functions. We theoretically
prove the convergence of QLD in convex landscapes, demonstrating that the
average energy of the system can approach zero in the low temperature limit
with an exponential decay rate correlated with the evolution time. Numerically,
we first show the energy dissipation capability of QLD by retracing its origins
to spontaneous emission. Furthermore, we conduct detailed discussion of the
impact of each parameter. Finally, based on the observations when comparing QLD
with classical Fokker-Plank-Smoluchowski equation, we propose a time-dependent
QLD by making temperature and $\hbar$ time-dependent parameters, which can be
theoretically proven to converge better than the time-independent case and also
outperforms a series of state-of-the-art quantum and classical optimization
algorithms in many non-convex landscapes.

利用量子朗之万动力学 (QLD) 解决优化问题，特别是对传统梯度下降算法的障碍较大的非凸目标函数进行研究，证明了在凸景观中 QLD 的收敛性，通过数值实验和与传统算法的比较，提出了一种优于其他算法的时间相关的 QLD 算法。

量子 Langevin 动力学优化

Quantum Langevin Dynamics for Optimization

We extend the global convergence result of Chatterjee
\cite{chatterjee2022convergence} by considering the stochastic gradient descent
(SGD) for non-convex objective functions. With minimal additional assumptions
that can be realized by finitely wide neural networks, we prove that if we
initialize inside a local region where the \L{}ajasiewicz condition holds, with
a positive probability, the stochastic gradient iterates converge to a global
minimum inside this region. A key component of our proof is to ensure that the
whole trajectories of SGD stay inside the local region with a positive
probability. For that, we assume the SGD noise scales with the objective
function, which is called machine learning noise and achievable in many real
examples. Furthermore, we provide a negative argument to show why using the
boundedness of noise with Robbins-Monro type step sizes is not enough to keep
the key component valid.

在考虑非凸目标函数的随机梯度下降的情况下，我们扩展了 Chatterjee（2022）的全局收敛结果。我们证明，如果我们初始化到一个局部区域，其中 Lajasiewicz 条件成立，那么在该局部区域内，具有正概率的随机梯度迭代会收敛到全局最小值，并且我们的证明的关键组成部分是确保 SGD 的整个轨迹以正概率留在局部区域内。为此，我们假设 SGD 噪声与目标函数成比例，称为机器学习噪声，并可在许多实际示例中实现。此外，我们提供了一个负面的论据，以表明使用类似于 Robbins-Monro 类型步长的有界噪声是不足以保持主要组成部分有效的。

深度神经网络在局部 Lajasiewicz 条件下随机梯度下降的收敛

Convergence of stochastic gradient descent under a local Lajasiewicz  condition for deep neural networks

A deep equilibrium model uses implicit layers, which are implicitly defined
through an equilibrium point of an infinite sequence of computation. It avoids
any explicit computation of the infinite sequence by finding an equilibrium
point directly via root-finding and by computing gradients via implicit
differentiation. In this paper, we analyze the gradient dynamics of deep
equilibrium models with nonlinearity only on weight matrices and non-convex
objective functions of weights for regression and classification. Despite
non-convexity, convergence to global optimum at a linear rate is guaranteed
without any assumption on the width of the models, allowing the width to be
smaller than the output dimension and the number of data points. Moreover, we
prove a relation between the gradient dynamics of the deep implicit layer and
the dynamics of trust region Newton method of a shallow explicit layer. This
mathematically proven relation along with our numerical observation suggests
the importance of understanding implicit bias of implicit layers and an open
problem on the topic. Our proofs deal with implicit layers, weight tying and
nonlinearity on weights, and differ from those in the related literature.

本文基于深度平衡模型，分析其具有非凸目标函数和非线性权重矩阵的回归与分类问题的梯度动态，证明了在没有对模型宽度的任何假设的情况下会以线性速率收敛到全局最优解，同时关注了隐式层的隐式偏差和其与浅层显式层的动态的关系。