The graduated optimization approach is a heuristic method for finding
globally optimal solutions for nonconvex functions and has been theoretically
analyzed in several studies. This paper defines a new family of nonconvex
functions for graduated optimization, discusses their sufficient conditions,
and provides a convergence analysis of the graduated optimization algorithm for
them. It shows that stochastic gradient descent (SGD) with mini-batch
stochastic gradients has the effect of smoothing the function, the degree of
which is determined by the learning rate and batch size. This finding provides
theoretical insights from a graduated optimization perspective on why large
batch sizes fall into sharp local minima, why decaying learning rates and
increasing batch sizes are superior to fixed learning rates and batch sizes,
and what the optimal learning rate scheduling is. To the best of our knowledge,
this is the first paper to provide a theoretical explanation for these aspects.
Moreover, a new graduated optimization framework that uses a decaying learning
rate and increasing batch size is analyzed and experimental results of image
classification that support our theoretical findings are reported.

本文定义了用于 graduated optimization 的一类新的非凸函数，讨论了其充分条件，并对 graduated optimization 算法的收敛性进行了分析。研究发现，带有 mini-batch 随机梯度的随机梯度下降 (SGD) 方法可以使函数平滑的程度由学习率和 batch size 决定。此发现从 graduated optimization 的角度提供了理论洞察，解释了为何大批量大小会陷入尖锐的局部最小值，以及为何逐渐减小的学习率和逐渐增大的批量大小优于固定的学习率和批量大小，并给出了最佳的学习率调度方法。此外，分析了一种新的 graduated optimization 框架，该框架使用逐渐减小的学习率和逐渐增大的批量大小，并报告了支持我们理论发现的图像分类的实验结果。

使用随机梯度下降平滑非凸函数：隐式逐渐优化与最优噪声调度的分析

Using Stochastic Gradient Descent to Smooth Nonconvex Functions:  Analysis of Implicit Graduated Optimization with Optimal Noise Scheduling

In this technical report, we present our approaches for the continual object
detection track of the SODA10M challenge. We adapt ResNet50-FPN as the baseline
and try several improvements for the final submission model. We find that
task-specific replay scheme, learning rate scheduling, model calibration, and
using original image scale helps to improve performance for both large and
small objects in images. Our team `hypertune28' secured the second position
among 52 participants in the challenge. This work will be presented at the ICCV
2021 Workshop on Self-supervised Learning for Next-Generation Industry-level
Autonomous Driving (SSLAD).

介绍在自监督学习下连续物体检测的方法，基于 ResNet50-FPN 模型，在模型校准、任务特定的重放机制、学习率调度、使用原始图像尺度等方面进行了多项改进，提高了大型和小型物体的性能，并在 SODA10M 比赛中获得了第二名。

SODA10M 挑战赛 2021 -- 持续检测赛道第二名解决方案

2nd Place Solution for SODA10M Challenge 2021 -- Continual Detection  Track

Adversarial Training (AT) with Projected Gradient Descent (PGD) is an
effective approach for improving the robustness of the deep neural networks.
However, PGD AT has been shown to suffer from two main limitations: i) high
computational cost, and ii) extreme overfitting during training that leads to
reduction in model generalization. While the effect of factors such as model
capacity and scale of training data on adversarial robustness have been
extensively studied, little attention has been paid to the effect of a very
important parameter in every network optimization on adversarial robustness:
the learning rate. In particular, we hypothesize that effective learning rate
scheduling during adversarial training can significantly reduce the overfitting
issue, to a degree where one does not even need to adversarially train a model
from scratch but can instead simply adversarially fine-tune a pre-trained
model. Motivated by this hypothesis, we propose a simple yet very effective
adversarial fine-tuning approach based on a $\textit{slow start, fast decay}$
learning rate scheduling strategy which not only significantly decreases
computational cost required, but also greatly improves the accuracy and
robustness of a deep neural network. Experimental results show that the
proposed adversarial fine-tuning approach outperforms the state-of-the-art
methods on CIFAR-10, CIFAR-100 and ImageNet datasets in both test accuracy and
the robustness, while reducing the computational cost by 8-10$\times$.
Furthermore, a very important benefit of the proposed adversarial fine-tuning
approach is that it enables the ability to improve the robustness of any
pre-trained deep neural network without needing to train the model from
scratch, which to the best of the authors' knowledge has not been previously
demonstrated in research literature.

本研究提出了一种基于缓慢上升和快速下降型学习率调度策略的对抗微调方法，该方法通过有效的学习率调度策略显著降低了计算成本，同时提高了深度神经网络的准确性和鲁棒性。 实验结果表明，该方法在 CIFAR-10，CIFAR-100 和 ImageNet 数据集上优于先前的最先进方法，同时将计算成本降低了 8-10 倍，并能够改善任何经过预训练的深度神经网络的鲁棒性，而无需从头开始训练模型。