We consider gradient descent (GD) with a constant stepsize applied to
logistic regression with linearly separable data, where the constant stepsize
$\eta$ is so large that the loss initially oscillates. We show that GD exits
this initial oscillatory phase rapidly -- in $\mathcal{O}(\eta)$ steps -- and
subsequently achieves an $\tilde{\mathcal{O}}(1 / (\eta t) )$ convergence rate
after $t$ additional steps. Our results imply that, given a budget of $T$
steps, GD can achieve an accelerated loss of $\tilde{\mathcal{O}}(1/T^2)$ with
an aggressive stepsize $\eta:= \Theta( T)$, without any use of momentum or
variable stepsize schedulers. Our proof technique is versatile and also handles
general classification loss functions (where exponential tails are needed for
the $\tilde{\mathcal{O}}(1/T^2)$ acceleration), nonlinear predictors in the
neural tangent kernel regime, and online stochastic gradient descent (SGD) with
a large stepsize, under suitable separability conditions.

使用常数步长的梯度下降算法应用于线性可分数据的逻辑回归，证明了在初始震荡阶段后，算法能够在 a 步的时间内实现 O (1/(aT)) 的收敛速率，从而在总步数为 T 的情况下，通过积极地调整步长可以达到 O (1/T^2) 的加速损失，无需使用动量或变化的步长调度器。

逻辑损失的大步梯度下降：损失的非单调性提高了优化效率

Large Stepsize Gradient Descent for Logistic Loss: Non-Monotonicity of  the Loss Improves Optimization Efficiency

Model quantization is challenging due to many tedious hyper-parameters such
as precision (bitwidth), dynamic range (minimum and maximum discrete values)
and stepsize (interval between discrete values). Unlike prior arts that
carefully tune these values, we present a fully differentiable approach to
learn all of them, named Differentiable Dynamic Quantization (DDQ), which has
several benefits. (1) DDQ is able to quantize challenging lightweight
architectures like MobileNets, where different layers prefer different
quantization parameters. (2) DDQ is hardware-friendly and can be easily
implemented using low-precision matrix-vector multiplication, making it capable
in many hardware such as ARM. (3) Extensive experiments show that DDQ
outperforms prior arts on many networks and benchmarks, especially when models
are already efficient and compact. e.g., DDQ is the first approach that
achieves lossless 4-bit quantization for MobileNetV2 on ImageNet.

我们提出了一种完全可微的方法，名为 differentiable dynamic quantization (DDQ)，可用于学习模型量化中的所有超参数，通过实验表明 DDQ 在像 MobileNet 这样的轻量架构上表现最好，并且 DDQ 是硬件友好型。

混合精度和自适应分辨率的可微分动态量化

Differentiable Dynamic Quantization with Mixed Precision and Adaptive  Resolution

We consider a primal-dual algorithm for minimizing $f(x)+h\square l(Ax)$ with
Fr\'echet differentiable $f$ and $l^*$. This primal-dual algorithm has two
names in literature: Primal-Dual Fixed-Point algorithm based on the Proximity
Operator (PDFP$^2$O) and Proximal Alternating Predictor-Corrector (PAPC). In
this paper, we prove its convergence under a weaker condition on the stepsizes
than existing ones. With additional assumptions, we show its linear
convergence. In addition, we show that this condition (the upper bound of the
stepsize) is tight and can not be weakened. This result also recovers a
recently proposed positive-indefinite linearized augmented Lagrangian method.
In addition, we apply this result to a decentralized consensus algorithm
PG-EXTRA and derive the weakest convergence condition.

本文考虑了一种基于近似算子的新型 Primal-Dual 算法及其收敛性，证明了比以前更弱的步长条件下可以收敛，证明了该步长条件是重要的，也将其应用到了分布式 PG-EXTRA 算法并导出了最弱的收敛条件。