Traditional optimization methods rely on the use of single-precision floating point arithmetic, which can be costly in terms of memory size and computing power. However, mixed precision optimization techniques leverage the use of both single and half-precision floating point arithmetic to reduce memory requirements while maintaining model accuracy. We provide here an algorithm to further reduce memory usage during the training of a model by getting rid of the floating point copy of the parameters, virtually keeping only half-precision numbers. We also explore the benefits of getting rid of the gradient's value by executing the optimizer step during the back-propagation. In practice, we achieve up to 25% lower peak memory use and 15% faster training while maintaining the same level of accuracy.

传统优化方法依赖于使用单精度浮点算术，在内存大小和计算性能方面具有成本。然而，混合精度优化技术利用单精度和半精度浮点算术来降低内存需求，同时保持模型准确性。我们在训练模型期间提供了一种算法，通过摆脱参数的浮点副本，实际上只保留半精度数，进一步减少内存使用。我们还通过在反向传播期间执行优化器步骤来探索去除梯度值的好处。在实践中，我们实现了高达25%的峰值内存使用降低和15%的更快训练速度，同时保持相同水平的准确性。

内存高效的混合精度优化器