This work is a part of ICLR Reproducibility Challenge 2019, we try to reproduce the results in the conference submission PADAM: Closing The Generalization Gap of Adaptive Gradient Methods In Training Deep Neural Networks. Adaptive gradient methods proposed in past demonstrate a degraded generalization performance than the stochastic gradient descent (SGD) with momentum. The authors try to address this problem by designing a new optimization algorithm that bridges the gap between the space of Adaptive Gradient algorithms and SGD with momentum. With this method a new tunable hyperparameter called partially adaptive parameter p is introduced that varies between [0, 0.5]. We build the proposed optimizer and use it to mirror the experiments performed by the authors. We review and comment on the empirical analysis performed by the authors. Finally, we also propose a future direction for further study of Padam. Our code is available at: https://github.com/yashkant/Padam-Tensorflow

本次研究是ICLR Reproducibility Challenge 2019的一部分，旨在重现文章PADAM: Closing The Generalization Gap of Adaptive Gradient Methods In Training Deep Neural Networks的结果。本文针对过去所提出的自适应梯度算法在一般化性能上不如带有动量项的随机梯度下降（SGD）的问题进行设计，并引入新的可调参数，部分自适应参数p，使其在自适应梯度算法和带有动量项的SGD之间建立桥梁。

ICLR复现挑战：Padam：在培训深度神经网络中缩小自适应梯度方法的泛化差距