We propose Additive Powers-of-Two~(APoT) quantization, an efficient
non-uniform quantization scheme for the bell-shaped and long-tailed
distribution of weights and activations in neural networks. By constraining all
quantization levels as the sum of Powers-of-Two terms, APoT quantization enjoys
high computational efficiency and a good match with the distribution of
weights. A simple reparameterization of the clipping function is applied to
generate a better-defined gradient for learning the clipping threshold.
Moreover, weight normalization is presented to refine the distribution of
weights to make the training more stable and consistent. Experimental results
show that our proposed method outperforms state-of-the-art methods, and is even
competitive with the full-precision models, demonstrating the effectiveness of
our proposed APoT quantization. For example, our 4-bit quantized ResNet-50 on
ImageNet achieves 76.6% top-1 accuracy without bells and whistles; meanwhile,
our model reduces 22% computational cost compared with the uniformly quantized
counterpart. The code is available at
this https URL

本研究提出一种高效的非均匀量化方案，称为 APoT quantization，该方案能够更好地匹配神经网络中权重和激活的分布，通过重新参数化剪裁函数来生成更好定义的梯度，并提供了一种细化权重分布的权重归一化方法，以使训练更加稳定和一致。实验结果表明，该方法胜过现有最先进方法，并且在 ImageNet 数据集上，经过 4 位量化的 ResNet-50 模型，准确率达到 76.6％，同时与均匀量化模型相比，模型计算成本降低了 22％。

Additive Powers-of-Two Quantization: 一种高效的非均匀化离散化神经网络方法

Additive Powers-of-Two Quantization: An Efficient Non-uniform  Discretization for Neural Networks

Proximal policy optimization (PPO) is one of the most successful deep
reinforcement-learning methods, achieving state-of-the-art performance across a
wide range of challenging tasks. However, its optimization behavior is still
far from being fully understood. In this paper, we show that PPO could neither
strictly restrict the likelihood ratio as it attempts to do nor enforce a
well-defined trust region constraint, which means that it may still suffer from
the risk of performance instability. To address this issue, we present an
enhanced PPO method, named Truly PPO. Two critical improvements are made in our
method: 1) it adopts a new clipping function to support a rollback behavior to
restrict the difference between the new policy and the old one; 2) the
triggering condition for clipping is replaced with a trust region-based one,
such that optimizing the resulted surrogate objective function provides
guaranteed monotonic improvement of the ultimate policy performance. It seems,
by adhering more truly to making the algorithm proximal - confining the policy
within the trust region, the new algorithm improves the original PPO on both
sample efficiency and performance.

本文介绍了一种名为 Truly PPO 的增强 PPO 方法，针对 PPO 在优化行为方面存在的问题进行了改进，通过使用新的剪辑函数来支持回滚行为，使用基于可信区域的触发条件替换剪辑的触发条件，从而提供了保证的拟态策略性能单调改进，从而改善了 PPO 在样本效率和性能方面的表现。