Parle：并行化随机梯度下降

Jul, 2017

Parle: parallelizing stochastic gradient descent

Pratik Chaudhari, Carlo Baldassi, Riccardo Zecchina, Stefano Soatto, Ameet Talwalkar

TL;DR提出了一种称为Parle的算法，用于深度网络的并行训练，与SGD的数据并行实现相比，收敛速度快2-4倍，同时在多个基准测试中实现了接近最新技术水平的显着改进错误率，不会引入任何额外的超参数，适用于单机、多GPU设置和分布式实施，具有高效的通信特性。

Abstract

We propose a new algorithm called Parle for parallel training of deep networks that converges 2-4x faster than a data-parallel implementation of SGD, while achieving significantly improved error rates that are nearly state-of-the-art on several benchmarks including CIFAR-10 and CIFAR-1